REALLOCATABLE attribute

We used a logarithmic scale to compute the overprovision and avoid excessively large arrays when n becomes too large. Since it is the same interface, the overprovision is handled since the first allocation… :thinking: so, maybe, instead of a resize=.true. one of your original proposals could be used to handle both things like: overprov='linear' overprov='log' (to offer a couple of heuristic rules)

I think it makes sense to only enable resizing on the last dimension.

Assuming that one doesn’t want to create a new attribute or a new statement , everything could be achieved with a new specifier in allocate(), that could be named “mode=” (or whatever) and would an empty string by default.

allocate( a([lb:]ub), mode='RESIZE' )

  • if a is not allocated: equivalent to allocate( a([b:]ub) )
  • if a is allocated:
    • if the size does not change, just update the lower and upper bounds in the metadata
    • if the size differ, is syntactically equivalent to:
      deallocate(a)
      allocate( a([lb:]ub) )

allocate( a([lb:]ub), mode='KEEP' [, fill=b])

  • implies ‘RESIZE’
  • in addition to the above rules about RESIZE, the content of the array is kept
  • if the size increases, then it is syntactically equivalent to:
    allocate( tmp([lb:]ub) )
    tmp(lb:lb + size(a,2) - 1) = a(:)
    call mv_alloc(tmp,a)
  • if the size decreases, then it is syntactically equivalent to:
    allocate( tmp([lb:]ub) )
    tmp(:,:) = a(lower_bound(a,2):lower_bound(a,2) + ub - lb)
    call mv_alloc(tmp,a)
  • if the size increases, the new elements can be optionaly initialized with b (scalar or rank-1 array)

allocate( a([lb:]ub1), mode='OVERPROVISION' )

  • implies ‘KEEP’
  • there is a decoupling between the apparent size of the array and a hidden capacity. The actual memory reservation is based on the capacity
    -the capacity shall always be greater or equal to the new apparent size, and smaller or equal to twice the new apparent size.

append(a, b)

  • equivalent to `allocate( a(lower_bound(a):upper_bound(a)+size(b)), mode=‘OVERPROVISION’, fill=b)

drop(a,n)

  • equivalent to `allocate( a(lower_bound(a):upper_bound(a)-n), mode=‘KEEP’)

Rank > 1 arrays

Everything would work about the same, but with mode='KEEP' only the last dimension could be resized (except if the initial size is zero, in which case there not restriction). The lower/upper bounds could be changed on all dimensions, whatever the mode.

Another caveat: if the array is not declared with a specific attribute, allocations upon assignment won’t be overprovisioned… I tend to think that the reallocatable attribute is needed to have a fully consistent behavior.

There’s something I’m not getting. If an array shall be allocated (or resized) one should write the attribute allocatable. If the attribute is not present, its size/shape should not be touched in the current scope. That seems like a good safety measure, not to leave hanging the possibility of inadvertedly changing the array.

Maybe you mean to defer the actual allocation/resizing latter on? It suffices to declare the array as allocatable on its parent scope/module, and allocatable in the procedure that will take care of allocating/resizing. No?

@hkvzjal Sure, the array has to declared with the allocatable attribute! My point is the following: the overprovisioning strategy could be triggered with new specifier in allocate, but then what about the allocations upon assignment?

Shouldn’t it inherit the same data + metadata of the source array? in that case, the same overprovision and the fact that it can be resized. Or would you preferer to control whether on assignment the array does not inherit the metadata?

Let me illustrate it:

real, allocatable :: a(:)
allocate( a, source=b, mode='OVERPROVISION') ! will possibly allocate more than size(b) behind the hood

And now:

real, allocatable :: a(:)
a = b ! how do you tell the compiler to overprovision the allocation of a?
real, allocatable :: a(:), b(:), c(:)
allocate( a(n) ) !> a is allocated exactly with size n
!... do stuff with a
b = a !> b inherits data and shape of a
!... do stuff with b
allocate( b(m) , mode='...' ) !> b is resized with m and, keeping its data and with some overprovision
! ... keep on doing stuff with b
c = b !> c inherits data and meta-data (overprovision included) from b

I would suggest to keep the rules as simple as possible.

The key step would be that this line allocate( b(m) , mode='...' ) shall not prompt an error just because the first allocate was a “simple” one. The explicit use of the extra argument alone enables reallocation but on the function, not on the object, behind the curtains what needs to be tracked is the actual size and the extra overprovision.

So, the array object holds the data & size/shape meta-data… the allocate statement holds the “modes” of allocation and resizing. This implies, “reallocability” would not be an attribute of the object, but an option of the statement.

This assumes 1) that b has been allocated with some overprovisioning, and 2) that you always want the overprovision provision to be transfered to c. Both constraints look too strong to me. You may want to have some overprovisioning on b but not on c, and vice-versa.

if the user needs such a fine grained control, then he can use

allocate( c, source=b, mode='...')

As you showed above and, somewhere in the documentation say explicitly “allocation under assignment will transfer all meta-data including overprovision. For finer control relay on the allocate statement” or something of the sorts.

I think it is more important to have few-but-clear rules, upon which one can build up, rather than try to handle all possible combinations.

I’ve been giving some more thought to this, here I would say that the overprovision size could be a permanent attribute of all allocatable arrays. Under this hypothesis, it would be just an integer value that can be copied around. And with a default to 0.

From here, we would only need to resolve your point (2) about the “hard-rule”, what happens under assignment?

The quetion here is whether the lhs keeps it attributes or inherits those of the rhs. Whatever is the choice for this, the other choice needs to be easy to accomplish too (with explicit allocate() or however). It would be necessary to support both behaviors.

There is also a question about i/o, particularly with an unformatted write/read. When you write out an allocatable entity, say that is a component of a derived type, do you write out just the currently allocated values, or do you write out the whole thing, overallocated memory and all? Unformatted i/o is supposed to be a direct memory copy, whatever is in the memory gets written out, and then read back in, so it is a bit by bit copy. How much should go in and out for an overallocated array?

It would be a nice feature of fortran to allow reads directly to unallocated arrays (e.g. to avoid having to use a linked list intermediate, among other reasons). So whatever convention is adopted for i/o on overallocated arrays should not get in the way of this, arguably more important, enhancement to the languge.

That’s a possibility, with the overprovisionning property set by a specifier in allocate: allocate(a(n), overprovision=.true.)

“overprovioning property” transfered from the rhs to the lhs:

In a = b, a would inherit the property from b (with the same rule it inherits the lower/upper bounds, or without any exception?)

Note however that a typical use case of a dynamically resizeable array is when appending elements (and often iteratively):
a = [a, b]

Being a temporary object, the overprosionning status of the array constructor [ ] should be .false., meaning it will .false. also on ‘a’ after the assigment, even if it was .true. before. This is not desirable. A specific statement/routine would be needed to ensure that the properties of a are kept:
append(a, b)

“overprovioning property” NOT transfered from the rhs to the lhs:

Once a is allocated with the property, it keeps it all along. In this case, it would be clearer to declare a with a specific attribute IMO, as this would be an essential property.

Yes and no… When writing a whole array, only the content is written, not the metadata for instance. And in the case we are talking about, only the content visible from the code should be written, not the overallocated part.

If an allocatable array of an intrinsic type is written, then you are right, only the data is written. But if a derived type is written, or an array of a derived type, then any padding that occurs within the structure is written out, and back in again upon read. In general, the external representation on the file is bit for bit the same as the internal memory. This padding is not metadata. The metadata includes the rank, bounds, addresses, and so on. So the question with overallocation would be whether or not the extra memory is treated the same as padding. I can think of good arguments for either convention.

When a scalar is written in an unformatted file, it’s always a bit for bit copy, wether it is an intrinsic or a derived type. For instance if a 80 bits real is internally stored on 128 bits with 48 bits unused, the 128 bits are written anyway.

So there’s no ambiguity at this level: a derived type containing a reallocatable array should be written bit for bit, including all the metadata of the array and all the overallocated space.

Now what about writing a reallocatable array of <whatever>? When writing an array, none of the array structure is written, just the stream of the elements. Only the visible elements should be written, not the overallocated space, and I can see no good reason for the other convention.

Agree with this! Overprovision is there to handle efficiently dynamic resizing during processing, it should not be written to file.

I always write the explicit bounds of the portion of the array I want to write, even if allocating for the exact size and shape. I can understand that one would be tempted to do write(unit,*) array instead of write(unit,*) array(lb:ub,....). But in that case the overprovision should not be written down.

I’m not sure, whether this feature is really worth a change in the language. Using current language elements, you can implement most of it already. I’ve set up a minimal library demonstrating a reallocatable array like container.

Of course, the allocation and access syntax will have to be different from the usual ones (e.g., call rearray%allocate(newshape=[2, 2]) instead of reallocate(rearray(2, 2))), rearray%view(:, 1:2) in expressions instead of rearray(:,1:2)). I find this, however, a fair deal in exchange for the flexibility programmers gain in order to fine-tune the reallocation-strategy in their implementation. If you built that into the language, you would be doomed with some fixed choices.

Note: The example library is just a minimal demonstration for a single array type (real64, rank2), but once templates are in the language, one could turn that easily into a template which can be instantiated for arbitrary types. Also, it does not handle array lower bounds unequal 1, but that would be trivial to add.

I somewhat disagree with this. I think that the language should provide easy access to array reallocation/resizing. Reasonable steps by compiler vendors should be taken to ensure a relatively performant implementation (like in so many other languages); it doesn’t have to be the most performant solution.

Adding such feature at the language level does not stop users from implementing a more performant version of reallocate, specific for their use case.

In general, I believe that this functionality is so fundamental for a language that users should not have to implement it from first principles every time they need it.

2 Likes

Sure, but one can say that for many other things. On the one hand I do not fully disagree with you, but on the other hand the changes brought to the existing syntax would be minimal.

Note that in your implementation the array is accessible only via a pointer, with possible impact on the performances. Also, allocate on assignment is not possible.

:point_up_2: this summarizes pretty well the point of this thread!!

Indeed the language enables one hand-crafting this facility, but a minimal level should be available in the language, such that one is not obliged to reinvent the wheel every time