Enhancements to allocatable arrays

New attempt to take into account @RonShepard objections. Actually it can make the whole thing more consistent. I am also retaining the option 1 only.


The needs:

  • Allocating an array regardless its current allocation status. The current content would be lost.
  • Modifying the bounds without modifying the sizes; the content would be kept
  • Having something similar to C++ vectors, i.e. resizeable arrays with or without keeping the content (note that in the C++ vector case, the content is always kept), with an internal allocation strategy that reduces the needs of allocating new memory and copying the content.

(just go back to the initial post for a short description of how C++ vectors work)

A few new specifiers in allocate could handle all these cases:

mode=<character(*)>
capacity=<integer>
cap=<character(*)>)

Reallocation

In short: one can reallocate an already allocated array (but this works also with an unallocated array). But under the hood the compiler allocates a new memory area only if needed (i.e. if the new requested size exceeds the current capacity), otherwise it simply recycles the already allocated memory and just updates the array metadata.

There’s no guarantee to retrieve the initial content after the reallocation.

In the rules below, an effective reallocation may occur each time the capacity changes (i.e. a pointer to the array may become undefined), and no effective reallocation must occur if the capacity doesn’t change (i.e. a pointer to the array remains defined)

allocate( a(n), mode='realloc', [capacity=c | cap=cc])

  • the capacity and cap specifiers are mutually exclusive
  • cap=cc
    • cc=‘grow’ (default):
      • if (n <= current_capacity) the capacity doesn’t change
      • if (n > current_capacity) then the new capacity is set to max(2*current_capacity,n)
    • cc=‘auto’: same as ‘grow’, with an additional rule:
      • if (2*n < current_capacity) then the new capacity is set to min(2*n,current_capacity/2)
    • cc=‘fit’: the new capacity is set to n.
  • capacity=c
    • forces the capacity to max(c,n).
  • The following syntax is possible when one doesn’t want to change the size but only to update the capacity:
    allocate( a(:), mode='realloc', [capacity=c | cap='fit'])

Assignment lhs = rhs:

  • the capacity of the rhs is transfered to the lhs if and only if allocation on assignment occurs
  • otherwise the lhs keeps its previous capacity

allocate( a, mode='realloc', [mold|source]=b )

  • a inherits the capacity from b, unless specified otherwise with capacity= or cap=

simply updating the bounds

n = size(a)
allocate( a(lb:), realloc=.true.) ! the new shape is (lb:lb+n-1)
allocate( a(:ub), realloc=.true.) ! the new shape is (ub-n+1:ub)

rank > 1 arrays

The above description can be extended to any rank without any restriction. The sizes of all the dimensions can be changed

allocate( a(lb:,n,:), mode=realloc )

  • the bounds are updated on the first dimension without changing the size
  • the new size of the second dimension is set to n
  • no change on the third dimension

Important change compared to the previous proposal:
The capacity is expressed in number of elements, regardless the rank and shape of the array : a capacity of 100 can host an rank-1 array up to 100 elements, a rank-2 array of 5x20 elements, but not a rank-2 array of 10x20 elements.

Resizing

The overall principle is the same than for the reallocation case, but the content is kept (or a part of the content if the new size is smaller)

allocate( a(n), mode='resize', [capacity=c | cap=cc])

Because the content is kept, comme restrictions apply:

allocate( a, mode='resize', source=b )

  • does not make sense, as the objective here is to keep the initial content. The source specifier should not be allowed
  • however, one may want to initialize the new elements of the array when increasing its size; a new extend= specifier would be needed

allocate( a(m), resize=.true., extend=s)

  • s is a scalar
  • equivalent to
    n = size(a)
    allocate( a(m), resize=.true. )
    a(n+1:m) = s

allocate( a, resize=.true., extend=b)

  • b is a rank 1 array
  • equivalent to
    n = size(a)
    allocate( a(n+size(b)), resize=.true. )
    a(n+1:m) = b

A typical use case of resizable arrays is when appending elements to an existing array:
a = [a, b] (b being a scalar or a rank 1 array)
However, an array constructor has no overprovisioned capacity according to the above rules, and consequently a has no overprovisioned capacity either after the assignement. The extend= specifier can be used:
allocate( a, resize=.true., extend=[b] ) or
allocate( a, resize=.true., extend=b )

But a new statement/routine could actually be desirable:
append(a,b)
And similarly drop(a,k) as a shortcut to:
allocate( a(max(size(a)-k),0), resize=.true. )

rank > 1 arrays

This is where the main restriction would apply compared to the mode='realloc' case: only the last dimension could be resized. For, resizing the other dimensions would force moving the existing content in all cases. Exceptions:

  • the array is not allocated
  • the size of the array is 0
  • the new size is 0
allocate( a(n1,n2,n3) ) ! all dimensions with a size > 0
...
allocate( a(:,:,n3+1), resize=.true. ) ! ok
allocate( a(n1+1,:,n3+1), resize=.true. ) ! illegal

In the extend=b specifier, b should be a scalar or an array of the same TKR than the array that is resized.