REALLOCATABLE attribute

This attribute would be much alike the ALLOCATABLE one, except that arrays declared with REALLOCATABLE could be resized dynamically by keeping the existing data in the array.

real, reallocatable :: a(:)
allocate(a(70))
a(:) = 1.0
reallocate(a(100))   ! a(1:70) is still 1.0

Why a reallocate function wouldn’t do with a classical allocatable array? Well, it could. But at low level there’s nothing like a reallocation: increasing the size of an array means 1) allocating a larger array, 2) copying the data from the old to the new array, 3) deallocating the old array while pointing to the new instead (this third step is handled by move_alloc)

reallocatable would instruct the compiler that reallocations will potentially occur, so that it can for instance overprovision the size under the hood. For instance with allocate(a(70)) the compiler would reserve 128*4 bytes, so that a further reallocate(a(100)) would not require any new allocation. Still, size(a) before the reallocation would return 70.

Of course, for reallocate(a(150)) the compiler would have to make a new allocation under the hood… But it would then reserve 256*4 bytes to prepare possible further reallocations.

Reallocatable arrays would also have an append routine:
call append(a,b), where b would be scalar or an array. The result would be the same as a = [a, b], but without new allocation needed most of time.

This could work also with ranks > 1, but with some restrictions: only the last dimension could be resized:

real, reallocatable :: a(:,:)
allocate( a(70,40) )
reallocate( a(:,50) )   ! the new shape would be (70,50)

All of this is actually similar to how C++ vectors are implemented, I think.

1 Like

I’ve proposed in the past a modification to the ALLOCATE statement to do something somewhat similar. Mainly, add a DEALLOCATE option to ALLOCATE that says if any array
in the allocate list is currently allocated, deallocate it and reallocate to the new desired size. This is only one step away from what I think you are proposing. My mod is focused on removing the current requirement to check for allocation prior to deallocating an array. ie

Instead of

If(ALLOCATED(A)) Then
  DEALLOCATE(A)
  ALLOCATE(A(n))
End If

just write

ALLOCATE(A(n), DEALLOCATE=.TRUE.)

All compilers I know will throw an error if you try to reallocate an already allocated array (if the STAT option is not in place).

I could see something similar with REALLOCATE

ALLOCATE(A(n), REALLOCATE=.TRUE, SOURCE=A(1:SIZE(A))

The DEALLOCATE mod would probably be trivial since all compilers (that I know of) currently check for prior allocation.

1 Like

I’m a fan of this concept. A general purpose language should have more easily used general purpose data structures easily available.

In my proposal, the compiler would know from the beginning that the intent is to reallocate the array, and therefore could anticipate the reallocation by allocating more memory than actually requested. So most of time no data copy would be needed, just updating a size in the descriptor.

1 Like

Your idea is very good, I actually have an inhouse implementation of reallocate using move_alloc with a safety extra size for that purpose… but then, if any evolution in that direction would be feasible, wouldn’t it be easier to just transfer your proposal to ‘allocatable’ and ‘allocate’ ? Instead of having yet another attribute and yet another intrinsic.

It is quite safe to imply that an allocatable array would/could be reallocated at some point. And by enabling ‘allocate’ to also ‘reallocate’ a lot of boiler plate would be reduced.

One idea could be for allocate to add an extra argument with a percentage of extra space. The first time the function is invoked the max size is kept in place, shall a new allocate push beyond, a new max size is set using the percentage previously defined ?

1 Like

Looking at what Fortran has to offer today, I don’t think it’s as bad. For example:

One way this can be achieved in 2 commands is by using a non-stopping deallocate:

deallocate(a,stat=i) 
allocate(a(n))

but also, reallocation on assignment is part of Fortran since F2003:

would be something like

a = reshape(a, [70,50],  pad=0)

An issue with the reallocate on assignment idea is that it’s always going to go through the following three steps:

Create new array of desired size
Copy contents of old array into new
Delete old array

This is a lot slower than if we had access to a dynamically sized array type that could grow only when the requested new size is greater than the currently allocated size. Allocated size would likely be something like smallest power of 2 bytes large enough.

Nope. The latter implies a new allocation and a data copy in almost all cases. This is precisely what I try avoiding in my proposal.

You mean something like

real, allocatable :: a(:)
allocate(a(70),overprov=50)   ! 50% overprovisionning (i.e. 105*4 bytes reserved)
a(:) = 1.0
allocate(a(100),realloc=.true.) ! no new allocation occurs as 100<=105
                                ! a(1:70) is still 1.0, a(71:100) is 0.0
...
allocate(a(200),realloc=.true.) ! new allocation is required as 200>105
                                ! 300*4 bytes reserved
...
allocate(a(350),realloc=.true., &
                overprov=0)       ! new allocation is required as 350>300
                                  ! 350*4 bytes reserved, no overprovisionning from now
...
allocate(a(351),realloc=.true.) ! new allocation is required as 351>300
                                ! and overprovisionning has been disabled

Yes, that is more or less what I had in mind, just wandering if the realloc=.true. would be necessary?

I think realloc= would be needed for backward compatibility, as the allocation of an already allocated object is supposed to issue an error. Unless one considers that the initial overprov= (of whatever the name) has already triggered a specific behavior different from the legacy one.

The objection I may have to this approach is that it assumes that the implementation of the reallocatable arrays is necessarily based on overprovisionning. Although I can’t see alternative implementation, it’s maybe a too strong constraint.

An alternative syntax for a C++ vector class dynamic array might be something like

real(real64), dynamic, growth_factor=2 :: a(100)

Then with something like allocate on assignment whenever you are doing an assignment that would increase the size of a beyond its initial value the compiler automatically reallocates the size of a to first 200 and copies the contents of the existing a into the resized array. In other words, get rid of the allocatable requirement and replace it with a dynamic array type. There could be a default growth factor so specifying the growth rate wouldnt be needed unless the user requires a faster or slower rate. The restriction would be you have to define an inital size (can’t be deferred ala allocatable arrays).

This error makes sense currently as ‘allocate’ only works the first time. But given a future hypothetical compiler that enables reallocation, this error wouldn’t be needed I feel. The idea of having an overprovision value trigger an additional internal state of “reallocability” also sounds interesting if one were really forced to keep the non-reallocability behavior (still thinking in which case it should be kept ?). This should not add too much execution overhead

See this github proposal: Optional conditional reallocation in allocate · Issue #318 · j3-fortran/fortran_proposals · GitHub to discuss it and keep track of this feature.

2 Likes

How it could work:

Rationale: having resizable arrays that mimic the C++ vectors. Under the hood the compilers could reserve more memory than actually resquested by the user, in order to avoid moving data in memory most of time (but NOT all the time).

real, reallocatable :: a(:)

In contrast to an allocatable, a(:), a would be allocated with a size=0 from the beginning

reallocate( a([lb:]ub) )

  • if lb and ub are the same, does nothing
  • if lb and ub differ, but the size is the same, just update the lower and upper bounds in the array metadata
  • if the size differ:
    – if the size is smaller: resize to the new size while keeping the existing content up to the new size
    – if the size is greater: resize to the new size while keeping the whole existing content; beyong the new size, the content is undefined

reallocate( a, mold=b )

  • equivalent to reallocate( a(lower_bound(b):upper_bound(b)) )

reallocate( a, source=b )

  • equivalent to a = b (as for an allocatable)

reallocate( a([lb:]ub), fill=b )

  • the content of b is transfered to the new elements of a when the new size is larger than the previous size. b can be a scalar or a rank 1 array

append(a,b)

  • syntactically equivalent to a = [a, b]

Rank > 1 arrays

Everything would work about the same, but only the last dimension could be resized, except if the initial size is zero:

real, reallocatable :: a(:,:)
reallocate( a(n  ,m  ) )   ! ok
reallocate( a(n  ,m+1) )   ! ok
reallocate( a(n+1,m+1) )   ! WRONG 
reallocate( a(:  ,m  ) )   ! equivalent to a(size(a,1),m)

reallocat[abl]e or allocat[abl]e ?

The new reallocate() statement is maybe not needed, allocate() could do, just with different rules for the reallocatable arrays:

real, reallocatable :: a(:)
allocate( a(n) ) ! but under the hood the compiler woud reserve 
                 ! the memory for (for instance) 2*n elements

Alternatively, the reallocatable attribute is maybe not needed, and the “reallocatable” behavior could just be triggered by the reallocate statement.

real, allocatable :: a(:)
reallocate( a(n) ) ! ok, and the rules above apply

Or just with some option:

real, allocatable :: a(:)
allocate( a(n), resize=.true. ) ! ok, and the rules above apply

Arguments

A dummy allocatable array would be compatible with an actual reallocatable array, and vice-versa.

3 Likes

I’ll vote for this :+1: the extra argument can serve as a safety measure. And it makes either the reallacatable attribute or the reallocate statement unnecessary.

2 Likes

The C++ vector semantics have been mentioned a few times in this thread. Could someone please briefly summarize how C++ handles this reallocation?

Yes, but the caveat is that the compiler may not be ready for an efficient reallocation:

real, allocatable :: a(:)

! classical allocate, without overprovisioning:
allocate( a(1000) )  
 
! the compiler may be forced to allocate new memory because the previous 
! allocation did not plan a future reallocation: 
allocate( a(1010), resize=.true. ) 

This could be the user responsability to anticipate the reallocation strategy with resize=.true. even in the first allocation.

AFAIK there is some decoupling between the reserved vector size in memory (which is a hidden variable), and the requested size from the user. When a new size is requested, no deallocation/reallocation occurs if the new size is smaller than the reserved vector size.

This thread seems to be about one-dimensional arrays. What would happen with 2 or more dimensions?

This would be straightforward to implement in fortran too. Something like

allocate( a(n), reserve=2*n )

would give the programmer some control based on his knowledge of the demands of his application. I would say that this approach might work well for some of my applications. I currently do this by allocating the array with size 2*n in the first place, and then manually keep track of my current dimension k and referencing the array as a(1:k). If the compiler could be used to handle some of that logic, that would make things easier for this type of application. But there are many applications where this would not be practical.

There are two possible ways to do this, and they are incompatible with each other. One way is to reserve a larger workspace, and then to “reallocate” contiguous multimensional arrays within that workspace. The other way is to reserve the workspace based on upper bounds for each of the dimensions. In this case, a “reallocation” to a larger size will result in the original array being the leading subblock of the new array.

As you can see, one way results in contiguous arrays, while the other results in the leading subblock remaining intact after the reallocation. You can’t have both properties (without moving the data), so either the compiler implementor chooses which applies, or maybe the user could specify which convention he wants in the initial allocation step.