Is allocate a function or subroutine?

This question came to me when I was driving today.

1 Like

It’s a statement. :slight_smile:

Rules of thumb I use to tell them apart:

  • Function calls appear in expressions.
  • Subroutine calls begin with call.
  • Statements stand on their own.
4 Likes

I mistakenly write call allocate(...) more often than I care to admit :upside_down_face:

2 Likes

By the way I’m curious: why is move_alloc a subroutine and not a statement (as are allocate and deallocate)? Is there any technical reason for that?

3 Likes

It’s the only CALL statement that is an image control statement, so if we were doing it again, I suspect it would make more sense to make it a statement.
One can eclipse an intrinsic with scoping tricks but one cannot eclipse a statement.
Also, statements can be decorated with extra syntax. To do the same with intrinsic subroutines, would require decorating the CALL statement itself, and we seem not ready to do that (it is pretty ancient).

1 Like

It would look screwy if allocate() were a subroutine. Instead of something simple and intuitive like

allocate( a(-9:9,-1:1), b(n), c )

the subroutine version would need to look something like

call allocate( a, [-9,9,-1,1], b, [1,n], c, [] )

where the zero-size array [] would be required for a scalar allocatable. I’m ignoring the source= and mold= optional arguments.

I think deallocate() could be a normal looking subroutine, but that would probably have caused some confusion since it would no longer be symmetrical within the syntax with allocate().

The move_alloc() intrinsic works pretty much like a normal subroutine. It would be a little difficult to replicate in fortran code because of the arbitrary type+kind+rank aspect, but otherwise it behaves the way a subroutine is expected to behave.

It would simplify a lot of my code if allocate() had an optional argument to trigger conditional reallocation of an existing array. Say something like

allocate( a(lb:ub), conditional_reallocation=.true. )

If a(:) is already allocated with lower bound lb and upper bound ub, then nothing happens, the array is unchanged. If a(:) is already allocated with the correct size but with different bounds, then the bounds are changed but the contents of the array would remain unchanged. If a(:) is already allocated but with a different size, then it is deallocated and reallocated with the new size and bounds. All of this can be done now with some trickery, but these common operations should be a standard part of the language.

1 Like

Thanks, I reported your proposal at Optional conditional reallocation in allocate (`reallocatable` attribute) ¡ Issue #318 ¡ j3-fortran/fortran_proposals ¡ GitHub.

1 Like

We created a reallocate function at work precisely to dynamically reallocate any array. Internally it uses call move_alloc, we saw no need for a conditional argument. If the array doesn’t exists it uses allocate internally for the first allocation, if it already exists, an allocate on a temp + a call move_alloc do the work to modify the size accordingly.

I’m just wondering, if allocate could evolve to enable reallocation, why would one want to have an error on reallocation? (Which is the only reason I see to add an extra argument for conditional reallocation)

You might want opt for an error to catch bugs with unnecessary allocations that you don’t need. With an error on, you must always explicitly deallocate, so it will never happen that you allocated and then reallocated without realizing it.

I see your point, just thinking that (take the following just as brainstorming on the topic :wink: ) if one doesn’t want to reallocate a simple if(.not.allocated(myarray)) ... would ensure to allocate only the first time.

Allocatable arrays in Fortran are safer than using new from C++, to my experience, while I have seen some memory leaks with the latter. With allocatables I have seen memory corruption if one tries to write in the wrong memory section by mistake, but they will be destroyed on exiting the scope (if not, is because one wrote outside the bounds and the program would crash normally).

At the end I just think that the extra argument is already taken care by the allocated attribute.

So, I would say to put the energy on enabling reallocation through the allocate statement, then workout if an extra argument is an absolute requirement.

IMO this should be enlarged to this: REALLOCATABLE attribute (the syntax doesn’t matter, just the principle of reallocatable arrays that would mimic the C++ vectors)

1 Like

Thank @PierU, I just mentioned the other thread as well and added “reallocatable attribute” into the title.

I think in that other thread, people were wanting a reallocate function that would extend an existing array allocation and preserve the data in the first elements. Thus it is a little different from my proposal.

My general feeling about extending an existing array is that if you need to do that often, you are probably using the wrong data structure. Instead of an array, it might be better to use a linked list, or a binary search tree, or some other data structure that is designed to be extendable from the beginning.

In my proposal, the only time the data is preserved is when the new bounds are different, but the size is the same. This operation, of changing the bounds of an existing allocatable array without copying the contents (i.e. an O(0) operation instead of an O(n) operation), is now difficult to do. You can do it in a nonportable way by modifying the array metadata, which is way too complicated for such a simple and common operation. You can also do it temporarily by calling a subprogram (which redefines the bounds of the dummy arguments) or with an associate block (which works the same way), but this is not a permanent change of the bounds, just a local change.

Well, I have to strongly disagree with you here. A data structure is strongly related to the algorithms and kind of operations you want to do. Linked lists and trees (which ever sort) are efficient for search operations. But are extremely inefficient if you just want to store a “physical field data structure”. And yes, one might want to do reallocation in such structure quite often: if you work on domains for which subsequent topological modifications are required (e.g.: remeshing) you have no choice but to reallocate. Then, in FEM at least, reallocation overhead is just transparent compared to the CPU time required by the non-linear solvers. So, it is totally worth to have native support for dynamic reallocation.

1 Like

Yes, I agree, and that was my point. If you need to reallocate an array often, then it is likely that you should not be using an array but rather some other data structure. There are exceptions, of course. If you know ahead of time that a modest upper bound to the array size exists, then you can just overallocate to that max size to begin with, and keep track of the working rank during your algorithm. In this case, “reallocation” is simply changing the integer bounds, almost no overhead at all. But if you don’t know the upper bound, or if the upper bound is impractically large, then you can’t do that. If you do decide to reallocate an array often, say every time it is referenced, then the memory reference overhead is O(n**2) for an array of current length n, and the memory allocation overhead is O(n). Other data structures that are extendable, such as the ones I mentioned before, can do better than that.

Ok, yes, reallocating an array every time it is referenced is definitely a bad idea, would even say there is a fundamental flaw in the architecture if one is drawn to such situation.

But working with dynamic arrays is extremely normal and useful. In the usecase I mentioned, one does not know the upper bound of the memory ahead of time since adaptive remeshing uses the physical fields to compute errors on the geometrical domains to then decide where to refine and where to derefine your mesh. So it can drastically change. But one would definitely not try to reallocate all the arrays every time they are referenced. They would be stored in a Module, and a specific reallocation+field data transport/remapping procedure would be called after remeshing the domain.

It is in-between your two patterns extrema :wink:

In the case you want to acces the data in the array but with a different memory layout, reshape or a pointer can do the trick. I actually created a few functions returning pointers from 1D arrays to “view” them with 2D or 3D layouts, and this is almost for free.

1 Like

@hkvzjal I think you probably want just a list. Which in LFortran and LPython is just like a 1D array, except that it can resize, like std::vector, so you can append elements at the end almost for free.

1 Like

@certik I’m talking about plain old allocatable arrays.

Maybe I’m getting through wrong as going to such length to explain my point feels like reallocation is a big part of what I do. But actually it is less than 5%. And it is such a small part because we care more about efficiency for the numerical solvers, for which arrays are just the perfect data structure. Dynamic resizing is a small step in the whole process, yet paramount for automatic mesh adaptation.

What I just would like to say is that I think allocate could safely enable reallocation with minimal change to the interface. (And when I say I think, it is because that’s exactly what we managed to accomplish with the reallocate function as a wrapper to allocate+move_alloc, that I wouldn’t mind to drop if allocate did the work)

1 Like

Your proposal would just be a particular case of that other one, actually.

You’re probably right in theory. In practice, though, I can see that the resizability feature of the C++ vectors is quite often used.

Your rules of thumb are a good way to distinguish:

  1. Function Calls in Expressions: Functions return values and are used in expressions.
  2. Subroutine Calls Start with ‘call’: Subroutines perform tasks without returning values and often start with ‘call’ in some languages.
  3. Statements Stand Alone: Statements are independent instructions that perform actions.