MOVE_ALLOC to grow an array

Here is a program that compares the speed of growing an array using MOVE_ALLOC and allocation on assignment. Often move_alloc is faster, but it depends on the compiler and operating system. I plan to tweet about move_alloc, so comments are welcome.

module m
implicit none
contains
pure subroutine grow_vec(x,xadd) ! append xadd(:) to x(:)
real, allocatable, intent(in out) :: x(:)
real             , intent(in)     :: xadd(:)
real, allocatable                 :: temp(:)
integer                           :: nx
if (.not. allocated(x)) allocate (x(0))
nx   = size(x)
allocate (temp(nx + size(xadd)))
temp(:nx) = x
temp(nx+1:) = xadd
deallocate (x)
call move_alloc(temp,x)
end subroutine grow_vec
end module m
!
program grow_array
use m, only: grow_vec
implicit none
real   , allocatable :: a(:),xran(:)
integer, parameter   :: nmax = 3*10**6, ngrow = 1000, nt = 3
integer              :: na
real                 :: t(nt)
allocate (xran(nmax))
call random_number(xran)
call cpu_time(t(1))
allocate (a(0))
na = 0
do ! grow array in a subroutine with copying and move_alloc
   call grow_vec(a,xran(na+1:na+ngrow))
   na = size(a)
   if (na >= nmax) exit
end do
print "(2a15)","size","max|a-xran|"
print "(i15,f15.10)",size(a),maxval(abs(a-xran))
call cpu_time(t(2))
a = [real ::]
na = 0
do ! grow array using allocation on assignment (AoA)
   a = [a,xran(na+1:na+ngrow)]
   na = size(a)
   if (size(a) >= nmax) exit
end do
print "(i15,f15.10)",size(a),maxval(abs(a-xran))
call cpu_time(t(3))
print "(/,3a15)","method","move_alloc","AoA"
print "(a15,2f15.3)","time(s)",t(2:nt) - t(1:nt-1)
end program grow_array
Edited output:
gfortran -O3 on Windows
         method     move_alloc            AoA
        time(s)          4.969         20.516

gfortran -O3 on WSL2
         method     move_alloc            AoA
        time(s)          3.258          4.459

flang -O3 on WSL2
         method     move_alloc            AoA
        time(s)          3.518          4.413

ifort -O3 /F512000000 on Windows
         method     move_alloc            AoA
        time(s)          4.156          5.703

ifort -O3 -heap-arrays on Windows
         method     move_alloc            AoA
        time(s)          4.641          8.750

 ifort -O3 -heap-arrays on WSL2
         method     move_alloc            AoA
        time(s)          4.077          3.025
5 Likes

ifort binaries segfault without -heap-arrays both on MacOS BigSur and Ubuntu 20.04 - might be worth mentioning in tweet

gfortran-10 -O3 on Ubuntu 20.04, i5-10505 CPU @ 3.20GHz, 16GB RAM
         method     move_alloc            AoA
        time(s)          4.286          5.562

ifort -O3 -heap-arrays on Ubuntu 20.04
         method     move_alloc            AoA
        time(s)          5.054          4.016

gfortran-11 -O3 on MacOS BigSur, i5-4288U CPU @ 2.60GHz, 8 GB RAM
         method     move_alloc            AoA
        time(s)         13.554         23.184

ifort -O3 -heap-arrays on MacOS BigSur
         method     move_alloc            AoA
        time(s)         14.547         28.613

interesting big difference move_alloc vs. AoA on MacOS (as on Windows in your results)

1 Like

Due to exceeding the default stack size limit. Works fine if one removes the limit before running; e.g. ulimit -s unlimited with bash

1 Like

I’m surprised the AoA worked as well as it did. (I was also unaware that arrays could be used inside an array constructor expression. Neat!)

A couple suggestions:

  • I think you’d be better off using system_clock (with int64 arguments) in your timings, and not including the print statements (with the calculation of maxval(abs(a-xran))) in the timed portion.
  • For move_alloc, a lot of what you are timing is just the procedure call overhead to grow_vec (accounts for 75% of the time with ifort on linux for me). Consider this third in-line alternative:
a = [real ::]
na = 0
do
   block
    real, allocatable :: temp(:)
    allocate(temp(na+ngrow))
    temp(:na) = a
    temp(na+1:) = xran(na+1:na+ngrow)
    deallocate(a)
    call move_alloc(temp, a)
   end block
   na = size(a)
   if (size(a) >= nmax) exit
end do
1 Like

Given the semantics in the standard for MOVE_ALLOC, the statement deallocate(a) is not needed.

Depending on the processor, such explicit allocation may even adversely affect performance.

4 Likes

@Beliavsky , it’s good you’re seeking comments. My suggestion will be for you to work harder, much much harder and give your drafts a lot more thought so that you can make them as brief as possible before tweeting.

In the case of MOVE_ALLOC for this use case, for example you can start with something as follows and make it further compact:

! canonical method to grow an array:
! say a and tmp are rank-1 objects of the same type (and kind)  and
! tmp is allocated to desired new shape for a
   tmp( 1:size(a) ) = a
   tmp( size(a)+1: ) = .. ! data to be added
   call move_alloc( from=tmp, to=a ) 

And that a couple of other options include:

! array constructor
   a = [ a, [ .. ] ]
! RESHAPE intrinsic function
   a = reshape( a, shape=[ new_shape ], pad=.. )

In my opinion, your tips thus far have been verbose.

Separately in the context of those coding in anger with Fortran and coming to this Discourse for information and guidance, what will be useful is to realize

  • a need to dynamically grow/shrink an array will be helped by containerizing it suitably, perhaps in a derived type;
  • the growth or shrinkage should follow a scheme suitable for the need at hand; a default scheme is usually to double/halve the size of the object in the container but there may be other better schemes depending on the program and the processor.

So then the approach should be to study the options vis-a-vis the needs while paying careful attention to the instrumentation around what is one is measuring and how so. The code in the original post has issues in this aspect as pointed out upthread.

A quick go-by can be as follows where the scheme to double the size each time is considered. To make it somewhat more interesting the option with RESHAPE intrinsic is also thrown in. One can then see the performance variation with the option and the shape and relate that to one’s need in actual code.

Click to see
   integer, parameter :: WP = selected_real_kind( p=12 )
   character(len=*), parameter :: fmth = "(g0,t10,g0,t30,g0,t50,g0,t70,g0)"
   character(len=*), parameter :: fmtg = "(g0,t10,g0,t30,g0.3,t50,g0.3,t70,g0)"
   integer, parameter :: nstart = 100000
   integer, parameter :: nruns = 12
   integer :: i, n(nruns)
   real(WP) :: t1, t2
   real(WP), allocatable :: a(:),xran(:)
   allocate (xran(nstart*(2**nruns)))
   call random_number( xran )
   n = [( nstart*2**(i-1), i=1,nruns )]
   blk1: block
      a = [real(WP) :: ]
      print *, "Method: Array constructor"
      print fmth, "i", "Array Size", "Time (sec)", "Max|a-xran|", "a(n/2)"
      do i = 1, nruns
         call my_cpu_time(t1)
         a = [ a, xran(size(a)+1:n(i)) ]
         call my_cpu_time(t2)
         print fmtg, i, size(a), (t2-t1), maxval(abs(a-xran)), a(size(a)/2)
      end do
   end block blk1
   blk2: block
      a = [real(WP) :: ]
      print *, "Method: RESHAPE intrinsic"
      print fmth, "i", "Array Size", "Time (sec)", "Max|a-xran|", "a(n/2)"
      do i = 1, nruns
         call my_cpu_time(t1)
         a = reshape( a, shape=[n(i)], pad=xran(size(a)+1:n(i)) )
         call my_cpu_time(t2)
         print fmtg, i, size(a), (t2-t1), maxval(abs(a-xran)), a(size(a)/2)
      end do
   end block blk2
   blk3: block
      a = [real(WP) :: ]
      print *, "Method: MOVE_ALLOC"
      print fmth, "i", "Array Size", "Time (sec)", "Max|a-xran|", "a(n/2)"
      do i = 1, nruns
         block
            real(WP), allocatable :: tmp(:)
            call my_cpu_time(t1)
            allocate( tmp(n(i)) )
            tmp(1:size(a)) = a
            tmp(size(a)+1:) = xran(size(a)+1:size(tmp))
            call move_alloc( from=tmp, to=a )
            call my_cpu_time(t2)
         end block
         print fmtg, i, size(a), (t2-t1), maxval(abs(a-xran)), a(size(a)/2)
      end do
   end block blk3
contains
   subroutine my_cpu_time( time )
      use, intrinsic :: iso_fortran_env, only : I8 => int64
      ! Argument list
      real(WP), intent(inout) :: time
      ! Local variables
      integer(I8) :: tick
      integer(I8) :: rate
      call system_clock (tick, rate)
      time = real(tick, kind=kind(time) ) / real(rate, kind=kind(time) )
      return
   end subroutine my_cpu_time
end
  • Expected program behavior
C:\Temp>gfortran -O3 p.f90 -o p.exe

C:\Temp>p.exe
 Method: Array constructor
i        Array Size          Time (sec)          Max|a-xran|         a(n/2)
1        100000              0.157E-2            0.00                0.14632783894497925
2        200000              0.216E-2            0.00                0.93604658738074942
3        400000              0.383E-2            0.00                0.38816074699824510E-2
4        800000              0.780E-2            0.00                0.48353192216229768
5        1600000             0.154E-1            0.00                0.17707306405838441
6        3200000             0.316E-1            0.00                0.34672021796550934
7        6400000             0.619E-1            0.00                0.49127175384201527E-2
8        12800000            0.123               0.00                0.41832146544600479
9        25600000            0.274               0.00                0.34490797395681683
10       51200000            0.499               0.00                0.43429002560932473
11       102400000           1.01                0.00                0.92205452988951198
12       204800000           6.73                0.00                0.86448646582845623
 Method: RESHAPE intrinsic
i        Array Size          Time (sec)          Max|a-xran|         a(n/2)
1        100000              0.230E-1            0.00                0.14632783894497925
2        200000              0.170E-2            0.00                0.93604658738074942
3        400000              0.343E-2            0.00                0.38816074699824510E-2
4        800000              0.784E-2            0.00                0.48353192216229768
5        1600000             0.143E-1            0.00                0.17707306405838441
6        3200000             0.273E-1            0.00                0.34672021796550934
7        6400000             0.547E-1            0.00                0.49127175384201527E-2
8        12800000            0.107               0.00                0.41832146544600479
9        25600000            0.201               0.00                0.34490797395681683
10       51200000            0.351               0.00                0.43429002560932473
11       102400000           0.646               0.00                0.92205452988951198
12       204800000           1.41                0.00                0.86448646582845623
 Method: MOVE_ALLOC
i        Array Size          Time (sec)          Max|a-xran|         a(n/2)
1        100000              0.263E-1            0.00                0.14632783894497925
2        200000              0.636E-3            0.00                0.93604658738074942
3        400000              0.187E-2            0.00                0.38816074699824510E-2
4        800000              0.316E-2            0.00                0.48353192216229768
5        1600000             0.552E-2            0.00                0.17707306405838441
6        3200000             0.123E-1            0.00                0.34672021796550934
7        6400000             0.243E-1            0.00                0.49127175384201527E-2
8        12800000            0.472E-1            0.00                0.41832146544600479
9        25600000            0.843E-1            0.00                0.34490797395681683
10       51200000            0.163               0.00                0.43429002560932473
11       102400000           0.225               0.00                0.92205452988951198
12       204800000           0.463               0.00                0.86448646582845623
4 Likes

Thanks for the reshape idea and code – I hadn’t thought of using reshape for this.

Regarding your previous post, I’m not sure that spending much more time on each tweet would improve the quality much, and it would slow the pace.The tips are listed by topic and given one-sentence subjects here. I would like to cover OOP, parameterized derived types, C interoperability, and coarrays. When the main features of Fortran covered I may slow the pace to weekly and cover Fortran tools (fpm, fypp etc.) and libraries (stdlib and others).

A long time ago I made a ‘grow’ derived type that behaved similarly to C++ std::vector. I used move_alloc() a lot in that! I was severely frustrated by the lack of generics when doing that. I didn’t know about all the preprocessors at the time.

Someone asked about this in Reddit:

At one time @Arjen had a std:vector like implementation in his FLIBS libraries. If I remember this predated MOVE_ALLOC and he defined a pointer to a container type that the user could define separately. Been a while since I looked at FLIBS so he might have updated it to use MOVE_ALLOC etc. I liked his procedure for growing the vector which defined a growth factor instead of just doubling the array size

Thanks for this clarification! I was actually about to ask the OP about it, since the original code deallocates the “old” array too:

Actually I find two of (my) top three google results not really clear about it:

  • gfortran docs say nothing about the allocation status of the “old” data after the move_alloc call (and overall are very concise, if not incomplete)

  • IBM docs are a bit more informative, but to me again not really clear about this detail.

  • Intel’s development guide finally fully discloses its behavior as “If to is currently allocated, it is deallocated.”

Your comment about the actual standard-defined semantics has left my doubts about compiler-dependent choices, which would have been unfortunate here (I’d really hated to recognize a memory leak in my compressed-sparse-row implementation, which actually did not deallocate! :face_with_peeking_eye:). Maybe this uncertainty could explain the use of deallocate statements in the posted snippets, users are not expected to read directly the standard imho, rather the compiler docs. :slight_smile:

The gfortran documentation state it clearly enough IMO,

MOVE_ALLOC(FROM, TO) moves the allocation from FROM to TO. FROM will become deallocated in the process.

It is the gfortran description of the second argument that is missing. Namely, if TO is allocated prior to the MOVE_ALLOC() call, then it is deallocated before becoming allocated with the FROM data. There are some other interesting features of MOVE_ALLOC() too. For example, if pointers are associated with elements of FROM, then after the call those pointers are associated with the elements of TO.

It would be useful if the FROM argument could also be a pointer. This would allow the programmer to do shallow copies from pointer-allocated arrays to allocatable arrays. This would require a change in the fortran standard. At present, to accomplish that requires a deep copy rather than the more efficient shallow copy.

2 Likes

Fair enough, but technically this is a result of the allocatable, intent(out) attributes for to, so not that specific to move_alloc.

Ok but I need to look at the source for move_alloc to realize that, right? Am I missing something? From the docs I read that from will be deallocated and that to will “point” to that new data, but nothing about what happens to what used to be stored therein. Anyway maybe it’s just me being not so educated about internal memory management, I realize that I don’t really know why those attributes imply that the pre-existing data would be cleared out after the call: I get that being the variable allocatable its data is dynamic in nature and that being intent(out) the data stored before the call would be replaced but I was not aware that this guarantees its safe removal from the memory.

You don’t need to look at the source, just the argument attributes (also provided in the gfortran documentation).

The deallocation of an allocatable dummy argument with intent out is part of the Fortran semantics. From a practitioner point of view it’s one of the Fortran “gotchas” you need to be aware of. Here’s the description from MRC (the “red book”), p. 111:

An allocatable dummy argument is permitted to have intent and this applies both to the allocation status (the descriptor) and the object itself. If the intent is in, the object is not permitted to be allocated or deallocated and the value is not permitted to be altered. If the intent is out and the object is allocated on entry, it becomes deallocated.

The rule also carries over to derived types with allocatable components.

1 Like

Not really, gfortran Docs is really good here: it captures well the essence of the standard semantics and shows the details re: the TO argument:

1 Like

Yes, that is a good point, intent(out) does cover both of the prior allocation status situations.

1 Like

An important difference between move_alloc and allocation on assignment exists when the lower bounds are different from one.
With AoA, the (re)allocated array will always have the lower bounds equal to one, regardless the bounds of the original array.
If you want to keep the original lower bound, you will need to employ move_alloc.

program tes_automatic_reallocation03
implicit none
real, dimension(:), allocatable :: v1, v1s, v2, v3
allocate(v1(-1:1))
call random_number(v1) ; v1s= v1
print'(a, *(g0,x))', 'v1: ', v1
print*, 'lbound:', lbound(v1, 1)
print*, 'ubound:', ubound(v1, 1)
allocate(v2(5))
call random_number(v2)
! Automatic reallocation
v1= [ v1, v2 ]
print'(a, *(g0,x))', 'v1: ', v1
print*, 'lbound:', lbound(v1, 1)
print*, 'ubound:', ubound(v1, 1)
! Move_alloc
allocate(v3(-1:size(v1s) + size(v2) - 2))
v3(lbound(v1s,1):ubound(v1s,1))= v1s
v3(ubound(v1s,1)+1:)= v2
call move_alloc(v3, v1)
print'(a, *(g0,x))', 'v1: ', v1
print*, 'lbound:', lbound(v1, 1)
print*, 'ubound:', ubound(v1, 1)
end program tes_automatic_reallocation03

$>ifort tes_automatic_reallocation03.f90
$>./a.out 
v1: .3920868E-06 .2548044E-01 .3525161
 lbound:          -1
 ubound:           1
v1: .3920868E-06 .2548044E-01 .3525161 .6669145 .9630555 .8382882 .3353550 .9153272
 lbound:           1
 ubound:           8
v1: .3920868E-06 .2548044E-01 .3525161 .6669145 .9630555 .8382882 .3353550 .9153272
 lbound:          -1
 ubound:           6