Do we need REALLOCATE?

Re. REALLOCATE.

Currently the simplest (but not the safest IMHO) way to do a REALLOCATE is to
use allocate on assignment like the following code snipet.

Program testreall
  
  Integer, ALLOCATABLE :: a(:)
  Integer :: i

  ALLOCATE(a(5), SOURCE=0)
  a = [1, 2, 3, 4, 5]
  a = [a(:), [(0,i=1,5)]]
  Print *, "size a = ", SIZE(a)
  Print *, " a     = ", a

End Program testreall

Unfortunately, I think the compiler still has to create a temporary array and do
a copy but I don’t know if the current implementations of allocate on assignment
can optimize the temp array etc away. Note the above works with Intel 2021.1 ifort.
I was unaware you could do this until I saw it mentioned in Modern Fortran Explained.

@msz59, the link you gave gives me “503 Service Unavailable”, so I can’t see the examples.

In general when I write high performance code I try not to use any reallocations, but figure out how much memory is needed ahead of time. The thread on Integer sequences that you quoted shows how to do it on that particular example.

If you have some example where realloc is needed, let me know and I can see if there is a way to rewrite it to not need it.

Update: I wrote one such example myself here: Add deallocate/reallocate option to ALLOCATE · Issue #61 · j3-fortran/fortran_proposals · GitHub, but it is not performance critical code.

This is Add deallocate/reallocate option to ALLOCATE #61 at j3-fortran/fortran_proposals. People who support adding this feature can comment and/or upvote there.

1 Like

Sorry but most answers just miss my point, advocating just to make ALLOCATE acting as a wrapper to ALLOCATED?DEALLOCATE:None; ALLOCATE (again). That would not solve the problem which I am talking about: Fortran, as it is now, does not seem to offer a possibility to effectively resize allocated array preserving the data already in it, as C’s realloc() function does. I think it should, as dynamic reallocation is a common task when dealing with, for example, input data which size is not known in advance. Doing that by allocating new memory, copying data and deallocate the old chunk every time is suboptimal, to use such an euphemism.

@certik: I don’t know why you can’t visit the link. It is working from my browser.
I can only quote the sources and the commands used by the author to test them:

# realloc_c.c
#include <stdlib.h>
#include <stdio.h>
#include <malloc.h>

int main(int argc,char **argv) {
  if(argc<2) {printf("bad args\n"); exit(1);}
  int n = atoi(argv[1]);
  int *data = malloc(sizeof data[0]);
  for(int i=1;i<n;i++) {
    data = realloc(data,(i+1)*sizeof data[0]);
    if(!data) abort();
    data[i] = i;
  }
}
! realloc_fortran.f90
program ra
integer, allocatable :: a(:),temp(:)
integer n,i
character*100 buf
call getarg(1,buf)
read(buf,*) n
allocate(a(1))
do i=2,n
allocate(temp(i))
temp(1:i-1) = a
call move_alloc(temp,a)
a(i) = i
end do
end program
$ gcc --std=c99 -O3 realloc_c.c -o realloc_c
$ gfortran -O3 -pedantic --std=f95 realloc_fortran.f95 -o realloc_fortran
$ time realloc_c 1000000

real 0m0.039s
user 0m0.016s
sys 0m0.024s

$ time realloc_fortran 1000000
1784293662

real 18m20.474s
user 10m7.414s
sys 8m12.463s
$
2 Likes

This example is contrived and would never occur if performance is important. It is a simple change to reallocate to twice the existing size when needed. The resulting Fortran program runs in 0m0.004s (with GNU, Intel and NAG compilers). The original C takes 0m0.010s.

2 Likes

I can access it now. Both of the examples are contrived — for performance critical code you do not want to reallocate in the middle of the loop. If you really have to, you double the size each time as @themos said, which amortizes the copying being done and pretty much eliminates the bottleneck. Btw, that is exactly how std::vector<T>::push_back() works in C++.

If you had to face this issue for some problem in practice, please let us know and let’s have a look at such an example.

1 Like

Well, I really know how to make it right (it is not my code!). My point is that Fortran could make it better on its own, without forcing the programmer to make all housekeeping in memory (re)allocation. With such an attitude, we do not need array syntax (make a loop when needed), generic procedures, operator overloading (invoke the proper function when needed) and so on. People (including me) were using Fortran 77 without any of the modern features for decades, and did their jobs right.

Reallocation of memory in Fortran is just not optimal. Do not try to convince me that it is.

5 Likes

The C++ vector does what @themos suggested, doubling capacity when it needs to grow. One can also explicitly reserve memory. Eventually Fortran is supposed to get such containers, but not in the next standard. The Fortran Template Library has ftlDynArray

A resizeable array container. It can slowly grow in size as elements are added at its end. Note that insertion at the end is an amortized constant time operation. Basically, ftlDynArray is exactly the same as C++'s std::vector. We just changed the name because calling a resizeable array a vector makes no sense from a mathematical point of view.

The qcontainers library also has a vector container.

1 Like

The Fortran standard doesn’t say anything about how memory is rearranged for (re)allocation on assignment. It would be entirely appropriate for a compiler to optimize that without any change needed in the standard.

1 Like

Probably not in all cases, but in some.

1 Like

@Beliavsky Arjen Markus also has an std:vector workalike in his FLIBS library. He doesn’t automatically double the array size though but specifies a growth factor (1.1) for increasing the array size as needed. Latest FLIBS code is at:

https://sourceforge.net/p/flibs/svncode/HEAD/tree/trunk/

The Fortran Standard doesn’t even mention “memory” in the normative parts (except for the Interoperability with C material and the SYNC MEMORY statement).

Program reall
  Integer, Allocatable :: m(:)

  Allocate (m(1))
  m(1) = 0
  Do i = 1, 10
    m = [ i, cshift(m,1) ]
  End Do

  Print *, cshift(m, 1)
End Program

The processor is perfectly entitled to calculate all that at compile time, or to figure out that the explicit allocation can be enlarged to 11, leading to better performance on one of those new-fangled “memory” machines.

Amusingly, the SYNC MEMORY statement also does not mention “memory”. It’s all about segment ordering.

The two issues “Does Fortran need a REALLOCATE” and “should my compiler optimize my allocations for me” are separate. A simple case like the one you described can be dealt with by an optimizing compiler. Where is the case that is difficult to analyse and would be made considerably easier by an addition to the language?

1 Like

Let’s consider

REAL, ALLOCATABLE :: A(:) 
ALLOCATE(A(10))
...
A = [ A, 1.0 ]

Is it possible that the compiler would optimize this ‘allocate on assignment’ statement in the C’s realloc() way, i.e. keeping the same memory area if possible?
The standard (10.2.1.3.3) says: If the variable is an allocated allocatable variable, it is deallocated […] If the variable is or becomes an unallocated allocatable variable, it is then allocated
So it seems that the standard requires A to be deallocated and then reallocated again with the new shape, and the value of the RHS copied to it. I can only imagine that the optimization would include allocation of new temporary array, copying the RHS and then internally execute MOVE_ALLOC.
But maybe I am wrong here.

What I would like to see would be REALLOCATE(allocatable_variable, new_size) which would work just as realloc(void *ptr, size_t size) in C.

My guess is that with the current definitions, such a reallocation is not possible in current Fortran. I’d be happy to read the contrary.

1 Like

@msz59 , please note "keeping the same memory area if possible " with C is a highly situation-dependent complier-specific thingy, there is no stipulation for it anywhere.

As @Beliavsky commented upthread with a comp.lang.thread therein, there have been previous suggestions along the same lines but with programmer convenience in mind but not any requirements/suggestions on the memory aspect which is a non-starter.

On the programmer convenience aspect, I’m greatly empathetic to what you’re asking and suggesting here, especially in this comment, this is a basic facility one would include if one has any considerations for the poor, persevering practitioners of Fortran, Alas, that is not the case and the language suffers as a result. Things won’t change anytime soon - it may be year 2040 before Fortran 202Y feature sets are robust for practical consumption and that’s the earliest anything might happen with the base language itself. So the only recourse is to homebrew your own solution such as custom REALLOCATE/RESIZE, INSERT_ELEMENT(S), PUSH_BACK, etc. and/or work with the Fortran stdlib community for a library approach.

1 Like

Yes, it is possible. You only have to persuade your compiler vendor that this optimization is worth doing.

The whole point about the Standard not referring to memory as such is precisely in order to enable optimizations such as this. Before the re-assignment, A is allocated. After the re-assignment, A is allocated. There is no standard conforming way of finding out what the allocation status DURING the statement is (that’s where the segment ordering rules for images come in). The compiler could very easily internally allocate a huge chunk of memory for A and only change the book-keeping (what should it report for ALLOCATED(), SIZE() or SHAPE()) when re-assignments happen. No visit to the memory allocator, no copying.

2 Likes

It is getting even more interesting. I have checked the number of times the address of the allocated array really changes after ‘reallocation’. The result was very surprising. It seems that the address changes are extremely rare. When the
allocated array is shrinking, almost no changes happen. Which would be obvious but from the timing, it seems that the data gets copied on every operation both ways.
BTW, is there a standard-conforming way of getting the address of a variable, other than LOC(variable) extension?

gfortran-10 -O3 realloc.f90
./a.out 100000
up: 35 addr changes 2.97556090 secs
down: 2 addr changes 1.71359110 secs
./a.out 200000
up: 38 addr changes 12.1577778 secs
down: 2 addr changes 7.31757259 secs

program testrealloc
  real, allocatable :: a(:)
  real :: t0, t1
  integer(kind=8) :: p1, p2
  integer :: i, nch=0, max=10000
  character(len=16) :: arg

  if ( command_argument_count() .ge. 1 ) then
    call get_command_argument(1, arg)
    read(arg,*) max
  endif
  allocate(a(1))
  p1 = loc(a)
  call cpu_time(t0)
  do i=2, max
    a = [ a, real(i) ]
    p2 = loc(a)
    if ( p2 /= p1 ) nch=nch+1
    p1 = p2
  end do
  call cpu_time(t1)
  print *, 'up:  ', nch, 'addr changes', t1-t0, ' secs'
  nch = 0
  call cpu_time(t0)
  do i=max-1, 1, -1
    a = a(1:i)
    p2 = loc(a)
    if ( p2 /= p1 ) nch=nch+1
    p1 = p2
  end do
  call cpu_time(t1)
  print *, 'down:', nch, 'addr changes', t1-t0, ' secs'
end program testrealloc
2 Likes

The caveat, per the Fortran standard, is there is no stipulation there be a companion C processor but the use of c_loc will require as such.

But now when a companion C processor is on hand, the easier approach will be to use C_ASSOCIATED function defined in the intrinsic module ISO_C_BINDING:

   use iso_c_binding, only: c_loc, c_ptr, c_associated, c_size_t
   integer, allocatable, target :: a(:)
   type(c_ptr) :: p1, p2
   integer(c_size_t) :: i
   a = [ 1 ]
   p1 = c_loc(a)
   print "(g0,z0)", "Address of a (hex) following initial allocation: ", &
      transfer( p1, mold=i )
   a = [ a, [ 2, 3 ] ]
   p2 = c_loc(a)
   print "(g0,z0)", "Address of a (hex) following reallocation: ", &
      transfer( p2, mold=i )
   print *, "c_associated(p1, p2)?", c_associated(p1, p2)
end

Note the response from 2 different compilers: with gfortran where you get a C realloc kind of behavior and the other with IFORT where you don;'t:

C:\temp>ifort /standard-semantics a.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.27.29112.0
Copyright (C) Microsoft Corporation. All rights reserved.

-out:a.exe
-subsystem:console
a.obj

C:\temp>a.exe
Address of a (hex) following initial allocation: 267DA1A6890
Address of a (hex) following reallocation: 267DA1A63B0
c_associated(p1, p2)? F

C:\temp>gfortran a.f90 -o gcc-a.exe

C:\temp>gcc-a.exe
Address of a (hex) following initial allocation: 793A40
Address of a (hex) following reallocation: 793A40
c_associated(p1, p2)? T

C:\temp>

2 Likes

I remember trying this a very long time ago. It did differ between compilers. Even if the pointer remained the same, the execution was still much slower than it was in C.