When needing to return large arrays I tend to use subroutine arguments rather than function results, because it’s never clear to me if some copy will happen upon return from the function.
I have finally made a simple test, with two function that return an array. In the first function the array is “allocatable”, and in the second one it is “automatic”, and I look at the address of the array within the function and upon return (if the address differ then some copy arised):
program foo
implicit none
real, allocatable :: b(:)
b = myfunc_alloc(128)
print*, " b:", loc(b)
print*
b = myfunc_auto(size(b))
print*, " b:", loc(b)
print*
b = myfunc_auto(1000)
print*, " b:", loc(b)
print*
b = myfunc_auto(10**9)
print*, " b:", loc(b)
print*
deallocate(b)
b = myfunc_alloc(10**9)
print*, " b:", loc(b)
contains
function myfunc_alloc(n) result(v)
integer, intent(in) :: n
real, allocatable :: v(:)
allocate( v(n), source=1.0 )
print*, "myfunc_alloc:", loc(v)
end function
function myfunc_auto(n) result(v)
integer, intent(in) :: n
real :: v(n)
v = 1.0
print*, " myfunc_auto:", loc(v)
end function
end program
This is an output I obtain with gfortran (12):
myfunc_alloc: 140385029471680
b: 140385029472192
myfunc_auto: 140385029472192
b: 140385029472192
myfunc_auto: 140385032605696
b: 140385032605696
myfunc_auto: 4721029120
b: 4721029120
myfunc_alloc: 4721029120
b: 8721031168
Observations:
- in the “allocatable” flavor, a copy is always performed at the end. On the one hand this is somehow understandable, as we explicitely create a new object within the routine. On the other hand, in the case where the assigned array at the end is itself allocatable, on might expect that just a move_alloc() be performed.
- in the “automatic” flavor, gfortran looks pretty smart:
- when possible it directly uses the final array instead of creating a new automatic array
- otherwise it effectively performs a kind of move_alloc(), without any copy, when the final array is allocatable (also because an allocation on assignment is possible in my test)
I wonder why the “allocatable” flavor doesn’t work the same as the “automatic flavor”? But the nice thing with gfortran is that even if the array is huge, the automatic array version still works (probably gfortran uses the heap instead of the stack if needed).
Intel Fortran (IFX 2023) doesn’t look as smart as gfortran on this test:
myfunc_alloc: 18322144
b: 18331264
myfunc_auto: 140722458572144
b: 18331264
myfunc_auto: 140722458568656
b: 18331840
Program stderr
forrtl: severe (174): SIGSEGV, segmentation fault occurred
The segmentation violation is expected: I don’t have ifx installed, so I’m testing online on godbolt.org, which has memory limitations. However, on the part of the test that did ran, one can see that all the adresses differ, meaning that ifx performs copies in all cases.
On godbolt.org one can also test flang(old), with the same memory limitation:
myfunc_alloc: 36849936
b: 36850464
myfunc_auto: 36850464
b: 36850464
myfunc_auto: 36850464
b: 36850464
Looks like it is as smart as gfortran to avoid unncessary copies.