Following some recent discussions ( REALLOCATABLE attribute, Is allocate a function or subroutine? - #6 by RonShepard ), some enhancements to the allocatable arrays could be proposed. The needs are (from simpler to more complex):
- Allocating an array regardless its current allocation status. The current content would be lost. It would just be a shortcut to
if (allocated(a)) deallocate(a)
allocate(a(n))
- Modifying the bounds without modifying the sizes; the content would be kept
- Having something similar to C++ vectors, i.e. resizeable arrays while keeping the content, with an internal allocation strategy that reduces the needs of allocating new memory and copying the content
Briefly, how do C++ vectors work:
- they are basically 1D arrays
- there is a decoupling between the size and the capacity. The size is the usual concept, and the capacity is the amount of memory effectively allocated under the hood: capacity >= size. It allows some overprovisioning of memory, such that the size can grow without necessarily having to reallocate some memory.
- a vector can be declared with or without an initial size.
v.resize(n)
sets the size ton
. If the capacity was less than n, then it is raised to n.v.reserve(n)
sets the capacity tomax(n,v.size())
Note that no implicit overprovisioning happens at this point, just an explicit one with the reserve()
method. Implicit overprovisioning happens with the methods that append or insert elements to/in the vector. for instance:
v.push_back(x)
appends a scalarx
to the vector:- the size is incremented by 1.
- if the new size is <= the current capacity then the new element is just written in memory
- if the new size is > the current capacity, then
- the compiler allocates memory with twice the capacity
- the content of the vector is copied to the new memory area
- the new element is written in memory
- the old memory area is deallocated
The capacity can be reduced only explicitly (with the reserve()
or shrink_to_fit()
methods), it is never implicitly reduced, even when downsizing the vector or when using methods that remove elements (e.g. pop_back()
)
(You can now directly jump to the post with the updated version of the proposal)
Reallocation regardless the allocated status
This could be achieved with a new realloc
specifier. The content would be lost.
real, allocatable :: a(:)
...
allocate( a(n) )
...
allocate( a(m), realloc=.true.)
Or it could be the new standard behavior (it would not break the backward compatibilty; just, illegal code would be now legal)
Modifying the bounds
The sizes are not changed and the content is kept. No specifier at all is needed, one just have to specify one of the new bounds:
allocate( a(n) )
allocate( a(lb:) ) ! the new shape is lb:lb+n-1
allocate( a(:ub) ) ! the new shape is ub-n+1:ub
For rank > 1 arrays, using just :
would mean no change for this dimension
allocate( a(n,m) )
allocate( a(:,lb:) ) ! the new shape is (1:n,lb:lb+n-1)
Resizeable arrays
Option 1
No new attribute is created, meaning that all allocatable arrays can potentially be resizable. Some new specifiers in allocate
are needed to tackle all possibilities: resize=<logical>
, capacity=<integer>
, and cap=<character(*)>
allocate( a(n), resize=.true., [capacity=c | cap=cc])
- if
a
was not allocated, performs a normal allocation, otherwise the array is resized while keeping the content (or a part of the content if the new size is smaller than the previous size) - the
capacity
andcap
specifiers are mutually exclusive cap=cc
- cc=âgrowâ (default):
- if (n <= current_capacity) then no reallocation occurs
- if (n > current_capacity) then the new capacity is set to max(2*current_capacity,n)
- cc=âautoâ: same than âgrowâ, with an additional rule:
- if (2n < current_capacity) then the new capacity is set to min(2n,current_capacity/2)
- cc=âfitâ: the new capacity is set to
n
. The following syntax is possible when one doesnât want to change the size:
`allocate( a(:), resize=.true., cap=âfitâ)
- cc=âgrowâ (default):
capacity=c
- forces the capacity to
max(c,n)
.
- forces the capacity to
- if the capacity changes, a reallocation+copy can occur; if the capacity does not change, no reallocation must occur.
Assignments:
- the capacity of the rhs is transfered to the lhs if and only if allocation on assignment occurs
- otherwise the lhs keeps its previous capacity
A typical use case of resizable arrays is when appending elements to an existing array:
a = [a, b]
(b being a scalar or a rank 1 array)
However, an array constructor has no overprovisioned capacity according to the above rules, and consequently a
has no overprovisioned capacity either after the assignement. Another specifier is actually needed if one wants some overprovisioning:
allocate( a(size(a)+size(b)), resize=.true., fill=b )
But a new statement/routine would actually be desirable:
append(a,b)
And similarly drop(a,k)
as a shortcut to:
allocate( a(max(size(a)-k),0), resize=.true. )
allocate( a, resize=.true., mold=b )
a
inherits the capacity fromb
, unless specified otherwise withcapacity=
orcap=
allocate( a, resize=.true., source=b )
- does not make a lot of sense, as the whole purpose of resizable arrays is to keep the content. These two specifiers should be mutually exclusive, OR it should behave like
a = b
Notes:
resize=.true.
could be omitted is eithercapacity
orcap
are specified (or,resize
could not be needed at all, andcap
orcapacity
would trigger the resizable behavior).- instead of
resize=.true.
one may have a more general specifiermode=[''|'realloc'|'resize']
, which would replace also therealloc=.true.
specifier described above
Option 2
Resizabilty would be an essential property of the array, with a specific attribute:
real, resizable :: a(:)
Everything would work pretty much the same, except that the resize
specifier is not needed at all (it is always implied).
Assignements:
- the capacity is never transfered from the rhs to the lhs, as it is an essential property of an array, i.e.
a = [a, b]
is equivalent toallocate( a(size(a)+size(b)), fill=b )
(withcap='grow'
as the default)
allocate( a, mold=b )
a
does NOT inherit the capacity fromb
allocate( a, source=b )
- does not make a lot of sense, as the whole purpose of resizable arrays is to keep the content. This specifier should not be allowed for resizable arrays, OR it should behave like
a = b
rank > 1 arrays
Everything would work pretty much the same, except that only the last dimension could be resized (except that all dimensions can be resized if the initial size is 0).
The capacity would refer to the size of the last dimension, not to the size of the whole array, e.g.:
allocate( a(n,m), resize=.true., capacity=m+2)
means that 2 columns are overprovisioned
fill
specifier:
! a(:,:) is already allocated
m = size(a,2)
allocate( a(:,m+5), resize=.true., fill=b )
b
can be a scalar or an array of the same TKR thana
append(a,b)
- if r=rank(a), then the rank of
b
can be (r-1) or (r)
Are resizeable arrays needed?
Good question. Indeed, they can be simulated with a user-defined class (could be in the standard library). The coming generics feature may even makes this simpler. However this approach has (and will still have) some obvious limitations. Arrays are really 1st class objects in Fortran, so any enhancement to them looks desirable to me, as long as it doesnât bloat the language (and I donât think the above proposal bloats it).