Data copy with functions returning arrays

When needing to return large arrays I tend to use subroutine arguments rather than function results, because it’s never clear to me if some copy will happen upon return from the function.

I have finally made a simple test, with two function that return an array. In the first function the array is “allocatable”, and in the second one it is “automatic”, and I look at the address of the array within the function and upon return (if the address differ then some copy arised):

program foo
implicit none

real, allocatable :: b(:)

b = myfunc_alloc(128)
print*, "           b:", loc(b)
print*
b = myfunc_auto(size(b))
print*, "           b:", loc(b)
print*
b = myfunc_auto(1000)
print*, "           b:", loc(b)
print*
b = myfunc_auto(10**9)
print*, "           b:", loc(b)
print*
deallocate(b)
b = myfunc_alloc(10**9)
print*, "           b:", loc(b)

contains

    function myfunc_alloc(n) result(v)
    integer, intent(in) :: n
    real, allocatable :: v(:)
    allocate( v(n), source=1.0 )
    print*, "myfunc_alloc:", loc(v)
    end function

    function myfunc_auto(n) result(v)
    integer, intent(in) :: n
    real :: v(n)
    v = 1.0
    print*, " myfunc_auto:", loc(v)
    end function

end program

This is an output I obtain with gfortran (12):

 myfunc_alloc:      140385029471680
            b:      140385029472192

  myfunc_auto:      140385029472192
            b:      140385029472192

  myfunc_auto:      140385032605696
            b:      140385032605696

  myfunc_auto:           4721029120
            b:           4721029120

 myfunc_alloc:           4721029120
            b:           8721031168

Observations:

  • in the “allocatable” flavor, a copy is always performed at the end. On the one hand this is somehow understandable, as we explicitely create a new object within the routine. On the other hand, in the case where the assigned array at the end is itself allocatable, on might expect that just a move_alloc() be performed.
  • in the “automatic” flavor, gfortran looks pretty smart:
    • when possible it directly uses the final array instead of creating a new automatic array
    • otherwise it effectively performs a kind of move_alloc(), without any copy, when the final array is allocatable (also because an allocation on assignment is possible in my test)

I wonder why the “allocatable” flavor doesn’t work the same as the “automatic flavor”? But the nice thing with gfortran is that even if the array is huge, the automatic array version still works (probably gfortran uses the heap instead of the stack if needed).

Intel Fortran (IFX 2023) doesn’t look as smart as gfortran on this test:

 myfunc_alloc:              18322144
            b:              18331264
 
  myfunc_auto:       140722458572144
            b:              18331264
 
  myfunc_auto:       140722458568656
            b:              18331840

Program stderr
forrtl: severe (174): SIGSEGV, segmentation fault occurred

The segmentation violation is expected: I don’t have ifx installed, so I’m testing online on godbolt.org, which has memory limitations. However, on the part of the test that did ran, one can see that all the adresses differ, meaning that ifx performs copies in all cases.

On godbolt.org one can also test flang(old), with the same memory limitation:

 myfunc_alloc:             36849936
            b:             36850464

  myfunc_auto:             36850464
            b:             36850464

  myfunc_auto:             36850464
            b:             36850464

Looks like it is as smart as gfortran to avoid unncessary copies.

3 Likes

As a practical matter, it is because allocatable arrays are typically allocated on the heap and automatic arrays on the stack. So the copy from automatic to allocatable would require the deep memory copy. However, the heap to heap copy is not required, a compiler could optimize away that copy with something like an implicit move_alloc() operation, a shallow copy.

It is not allowed to use a function result as the FROM argument in move_alloc(). If it were, that might be a way for the programmer to specify the correct desired behavior.

So given that restriction, I think the safest ways to avoid the redundant allocation are:

  1. use a subroutine where you explicitly allocate the arguments yourself.

  2. use a subroutine that allocates its dummy argument, and then use move_alloc() in the calling program to transfer the allocation as necessary.

As a separate issue, there is some ambiguity in the standard regarding an assignment like

a = b
deallocate(b)

where both a(:) and b(:) are allocatable. The deallocate() can be implicit, e.g. if the assignment is the last reference to the local b(:). I think the ambiguity is intentional, in order to allow the compiler to optimize the memory allocation, but the ambiguity has some consequences. If a(:) is allocated and a different size from b(:), then a(:) should be reallocated upon assignment, the contents of b(:) should be copied, and then b(:) should be deallocated. Any pointers to the targets a(:) and b(:) become undefined. The ambiguity is when a(:) is allocated and is the same size as b(:). In this case, one of two things can occur. One thing is that the contents of b(:) are copied to the existing array a(:) and then b(:) is deallocated; pointers to a(:) remain associated, and pointers to b(:) become undefined. But another thing that can occur is that the compiler can move_alloc(from=b,to=a). That is more efficient because no memory copy occurs, it is a shallow copy. However, pointers to the original a(:) would become undefined, and pointers to the original b(:) would be associated with the new a(:).

The programmer can prevent the implicit move_alloc() from occurring, but it is clunky. For example:

if(size(a)==size(b)) then
   a(:) = b
else
   a = b
endif

A programmer would not normally do that, so some care is necessary when a(:) and/or b(:) are targets and the simple assignment a=b occurs. And if the programmer wants the shallow copy, then he must use move_alloc(), he can’t depend on the compiler alone to recognize the short cut.

The language standard could be modified so that the a(:)=b semantics is required (rather than just optional) when the sizes match, but that would prevent the possibility of this particular memory optimization from occurring.

In section 10.2.1.3 Interpretation of intrinsic assignments

If the variable is an allocated allocatable variable, it is deallocated if expr is an array of different shape, any corresponding length type parameter values of the variable and expr differ, or the variable is polymorphic and the dynamic type or any corresponding kind type parameter values of the variable and expr differ.

It does not say that it is deallocated if the arrays are the same shape, but it also doesn’t say that it is not deallocated. There might be a phrase else that I haven’t found that says more. It might be reasonable to write an interpretation request if someone were so inclined.