Best practice of allocating memory in Fortran?

I come across the SciVision website when searching for other things. It contains a nice list of posts on Fortran.

In particular, I note the post entitled “Fortran allocate large variable memory”. It motivates me to ask the following two questions (note that there is a Question 2).

Question 1. What is the best practice of allocating memory in Fortran?

Personally, I wrap up a generic procedure named safealloc to do the job. Below is the implementation of safealloc to allocate the memory for a rank-1 REAL(SP) array with a size given by a variable n of kind INTEGER(IK). We can imagine, for example, the module consts_mod defines SP=kind(0.0) and IK=kind(0). In addition, validate is a subroutine that stops the program when an assertion fails, akin to the assert function in C or Python.

subroutine alloc_rvector_sp(x, n)
!--------------------------------------------------------------------------------------------------!
! Allocate space for an allocatable REAL(SP) vector X, whose size is N after allocation.
!--------------------------------------------------------------------------------------------------!
use, non_intrinsic :: consts_mod, only : SP, IK  ! Kinds of real and integer variables
use, non_intrinsic :: debug_mod, only : validate  ! An `assert`-like subroutine
implicit none

! Inputs
integer(IK), intent(in) :: n

! Outputs
real(SP), allocatable, intent(out) :: x(:)

! Local variables
integer :: alloc_status
character(len=*), parameter :: srname = 'ALLOC_RVECTOR_SP'

! Preconditions 
call validate(n >= 0, 'N >= 0', srname)

! According to the Fortran 2003 standard, when a procedure is invoked, any allocated ALLOCATABLE
! object that is an actual argument associated with an INTENT(OUT) ALLOCATABLE dummy argument is
! deallocated. So it is unnecessary to write the following line since F2003 as X is INTENT(OUT):
!!if (allocated(x)) deallocate (x)

! Allocate memory for X
allocate (x(n), stat=alloc_status)
call validate(alloc_status == 0, 'Memory allocation succeeds (ALLOC_STATUS == 0)', srname)
call validate(allocated(x), 'X is allocated', srname)

! Initialize X to a strange value independent of the compiler; it can be costly for a large N.
x = -huge(x) 

! Postconditions 
call validate(size(x) == n, 'SIZE(X) == N', srname)
end subroutine alloc_rvector_sp

[Update (2022-01-25): I shuffled the lines a bit, moving validate(allocated(x), 'X is allocated', srname) to the above of x = -huge(x).]

What do you think about this implementation? Any comments, suggestions, and criticism will be appreciated.

A related and more particular question is the following.

Question 2. What is the best practice of allocating large memory in Fortran?

The question can be further detailed as follows.

2.1. What does “large” mean under a modern and practical setting?
To be precise, let us consider a PC/computing node with >= 4GB of RAM. In addition, the hardware (RAM, CPU, hard storage, etc), the compiler, and the system are reasonably mainstream and modern, e.g., not more than 10 years old.

2.2. What special caution should be taken when the memory to allocate is large by the answer to 2.1?

Thank you very much for your input and insights.

3 Likes

Honestly I feel you are going to a lot of effort that provides almost no benefit whatsoever.

Your postconditions are only useful if you don’t trust that your compiler did the correct thing with an otherwise successful allocation. If you don’t trust your compiler to do allocation correctly then you shouldn’t be using it at all.

For the precondition, note that allocate(x(-1)) is valid and allocates a 0-sized array, but if the expectation is that the passed n is truly the desired size then you’ve probably got a bigger problem and that check belongs elsewhere (like where n is generated) and not with the allocation itself.

The only thing that I feel one could argue might be useful is checking the allocation status. However, I worked in a production code for nearly 20 years where all the allocations checked their status and in that time not once was one ever tripped to my knowledge. If the compiler has a quality implementation of allocate it ought to write a useful error message when it fails, and if so, the only reason to check the status yourself is if you have some way to gracefully recover from a failure. But if you’re just going to print an error message and fail, I’d suggest just letting the runtime system do that for you and not bother.

Now the downsides to wrapping the allocate in your own procedure are many. Your code is now obfuscated, you’ve lost the flexibility with the bare alloacate unless you have many different flavors (for different ranks, types, etc.) That’s a rabbit hole I wouldn’t go down.

7 Likes

Thank you, @nncarlson very much for the detailed response. I appreciate very much the insights from true practitioners and experts like you.

I tend to do “stupid” verifications when developing my code. Indeed, even more, I believe that the essence of pre/postconditions lies in their “apparent stupidity”.

As for the preconditions here, I agree that the compiler should guarantee both allocated(x) and size(x) == n to be true. Otherwise, the compiler is not trustable, and I should not be using it.

But wait, let me tell a true story first. On December 31, 2021, I found that Absoft Pro Fortran af95 (2022 with patch 4) could not always guarantee size(x) == n after executing allocate(x(n)) with an n >= 0. This would not have been discovered without my stupid postconditions. It has been reported to Absoft, whose developers have been working to resolve the problem since then.

Those who are interested and have Absoft Pro Fortran installed may clone my GitHub repo named test_compiler and run make atest_coa to reproduce the failure described above.

If I had not imposed the seemingly useless postconditions, my code would have failed in a violent and strange way when compiled by af95. It might have taken my whole new year break to debug in vain, because the bug was in the compiler instead of my code.

Is Absoft Pro Fortran af95 an untrustable compiler? I would not dare to claim so. af95 has been thankfully instrumental during my development. In addition, even if I do not trust it, I would not blame the users of my code for choosing such a compiler, which has been active on the market for so many years.

I agree that n should be checked when it is generated. However, I am afraid this may not mean that it is unnecessary to check n elsewhere (e.g., before calling allocate). In my development, I explicitly ask each subroutine to distrust any data that it receives and generates, no matter how the data is verified elsewhere (in other words, the subroutine distrusts the verification done by any other subroutines). Surely, this repeats numerous tests, but it has also helped me to locate many bugs.

Surely, most of the pre/postconditions are only imposed during the development or debugging. In the released version of the code, they will be switched off, so that there is no or little performance penalty. However, I tend to keep all the verifications in alloc_rvector_sp even in the released version.

No doubt that there are many downsides. Fortunately, my code deals with only a few kinds of REAL and INTEGER up to rank 2. It is affordable to implement a subroutine for each of the possible cases. In addition, if one day I decide to use the bare allocate instead of my procedure, it is not difficult to do so by a simple sed script. Allocation does not occur very often in my code anyway.

It seems that we have touched only Question 1 up to now. I look forward to hearing any opinion about Question 2. (Or maybe we should simply trust the compiler to handle the allocation, no matter small or large?)

2 Likes

Array allocation can be an expensive operation. Here is a suggestion that can eliminate some of that cost. Instead of intent(out), use intent(inout) instead. Then test explicitly if the array is already allocated the right size. If it is, then just set the value and return. Otherwise, deallocate if necessary and allocate and initialize the new array with the correct size. This way, repeated calls with the same size parameter are essentially free. This is the way that allocate() should work anyway, at least with some kind of optional argument to trigger the behavior, but it doesn’t, so this approach is a workaround for that limitation. It should also be pointed out that with allocate-on-assignment, explcit allocations are not always required, the assignment can be used instead. That does work more or less correctly, regardless of the prior allocation status of the left hand side of the assignment. Finally, I agree with the previous comments about how this is only useful for rank 1 integer arrays with lower bound 1 with default kind. Anything else, real, complex, logical, user defined types, different kind values, different lower bounds, etc. would need their own version of this subroutine. That is why allocate() itself needs to be fixed, there isn’t an easy way for programmers to work around all of those limitations.

2 Likes

I 100% agree with Neil, I also recommend to just use the Fortran’s allocate directly.

In addition, you actually won’t get a chance to handle an allocation error if you run out of memory on Linux, because of the OOM killer. You can try it yourself: if you try allocate an array that is 10x bigger than your memory, then it will probably allow you to handle the error, but if you allocate an array that potentially could fit into memory, you often get a “success”, but then it segfaults and Linux kills your program when you try to access the memory later, if you actually don’t have enough memory.

Conclusion: let the compiler handle this. If you don’t like your compiler behavior, please report a bug to the compiler vendor.

4 Likes

Allocating memory is a broad topic, but if we consider allocating memory for an integer / real array, there are 3 main types:
Local, where the size is defined by a constant or integer parameter,
Automatic, where the size is an argument to the routine.
Allocatable, which has the most flexibility.

A significant consideration when allocating memory is what memory pool is being used, either the stack or the heap.
By default local and automatic arrays are allocated on the stack, while allocate arrays are on the heap.

Stack Overflow is the curse of large arrays and always needs to be considered.

The stack is basically fixed in size and “small” (typically some MBytes, but I usually set the stack in the linker to 500 Mbytes, as a virtual memory pool). There is a stack limit, which is ridiculous for 64bit programs, I think code + stack < 2GB or 4GB.
The heap is extendable and can be the physically installed memory or extend to the virtual memory address space. Remember that when you ALLOCATE an array, it is not provided with physical memory pages until each virtual page is defined/used.

Scope: Local and Automatic arrays are automatically “deallocated” when they go out of scope (exit the routine). Allocate arrays will also be automatically “deallocated”, unless they have global scope via a module. Allocate arrays in a module do not go out of scope.
For large arrays, the easiest option is ALLOCATE onto the heap. Any array larger than the stack should be forced onto the heap via ALLOCATE. This is a robust way of defining large arrays.
The use of stack or heap for local/automatic arrays can be controlled by compiler options.
-fstack-arrays will shift local/automatic arrays onto the stack
-fno-automatic will shift local/automatic arrays onto the heap
-fmax-stack-var-size=n will shift large local/automatic arrays onto the stack (but what value for n?)
-fopenmp implies –frecursive implies -fstack-arrays

Note ALLOCATE does not imply physical memory allocation. For example, you can ALLOCATE an array up to the size of your virtual memory limit, then use it as a sparse array, using physical memory pages only as the space is addressed. However, “array = 0” will demand all physical pages, so in this case initialisation needs to be done carefully. Using/demanding more memory than physically installed is also a bad idea.

Efficiency is another issue. It is best to place many small arrays on the stack, but larger arrays (multiple memory pages) are just as efficient in the heap.
For most OpenMP, each thread has it’s own stack so stack arrays do not have memory page coherence problems that heap arrays might generate. For OpenMP, I define 500Mbyte stack for each thread, but memory pages are only allocated to the used portion of each stack. Works for me.
If not using OpenMP, I try to place all large arrays on the heap, which is a more robust approach.
The other problem with large arrays, especially with OpenMP and many threads, is you can easily overflow L3 cache and have a memory <> cache bottleneck, where the combined memory access rate of all threads exceeds the memory bandwidth.

The worst outcome is to place large local/automatic arrays on the stack and get a stack overflow when you have forgotten the code development phase of your project.

I hope these memory allocation issues are of interest.

7 Likes

Thank you @certik , for your response.

I guess most of your comments concern Question 2. What is your opinion regarding the following pre/postcontions in the code I show?

  1. n >= 0 before the allocation
  2. alloc_status == 0 after the allocation
  3. allocated(x) after the allocation
  4. size(x) == n after the allocation

Personally, I will not remove 1, at least during the development phase, because n < 0 may happen due to mis-transmission of the data and/or overflow. They may lead to bugs that are difficult to debug if we remove the checking.

For 2, if we are not supposed to check alloc_status == 0, why does allocate provide this value in the first place?

For 3, I am not sure.

The postcondition in 4 helped me to spot a bug in Absoft Pro Fortran, saving my time of debugging.

Many thanks.

1 Like

Thank you @JohnCampbell for the very detailed and informative elaboration! I have learned a lot from it.

1 Like
  1. Why cannot the compiler give a nice error message at runtime with a stacktrace if n < 0? As a user that’s what I would like.

  2. I think the reason allocate allows you to optionally handle the error yourself is that it is a good design: you can always override the default behavior and do things exactly as you would like. So I think we want that. But I think you were asking about “best practice”. For that I suggest to use the default behavior.

  3. That must be true, otherwise that’s a compiler bug.

  4. Again, that must be true, otherwise that’s a compiler bug.

So your implicit question is what to do with compiler bugs. Well if that happens, then they should be tracked down and reported. Your argument is that it allows to make it easier to track bugs. It could be, but I think that really is the responsibility of the compiler developers to ensure they implement the language correctly and that they fix bugs. I don’t think you should make your code more complicated or complex just because one (?) compiler has a bug. I think all you have to do is to run your code with a compiler that works and then just report it to the compiler vendor that does not work. (If you do that with LFortran for example, I’ll be happy to help track down the problem.)

If I follow the argument about “defensive programming” to its logical conclusion, I should write code like this:

integer :: i
i = 4 + 5
if (i != 9) error stop

just in case some compiler has a bug and doesn’t add numbers correctly.

Such tests are important, but they belong into the compiler test suite. So your tests about pre and post conditions are excellent tests for a compiler test suite. But I don’t think you should have such “compiler tests” in your production code. Or if you really have to, then have a directory with “compiler tests” where you test a given compiler to ensure it doesn’t have bugs. But in your main code then simply depend on the fact that these “compiler tests” pass. This ties to this issue:

And yes, we should absolutely have such test suite. Then you can simply test your compiler and see what features it implements and which fail. And in your code you would then only use features that work.

5 Likes

Thank you, @certik , for the detailed response. Your insights as both a compiler developer and a Fortran practitioner are particularly helpful. I understand my questions better now.

I do not disagree with this point, yet I would like to quote

I agree with all your other points, particularly regarding the test suite for the standard quoted below. I believe that, once established, it will become a cornerstone for the development and evolution of the Fortran. I look forward to it.

So, the conclusion is to let the compiler deal with memory allocation, and the best practice is to use the bare allocate statement. For my development, I will probably continue to use my wrapped procedure because it has helped me to avoid bugs; in the future released production code, I will write a sed script to replace it with the allocate statement.

Many thanks to everyone for the helpful discussion.

1 Like

Does anybody know the motivation for this behavior?

I just tested GFortran on 11.0.1 on Apple M1 and I get a segfault for allocate(x(-1)) and allocate(x(0)). So as a user, I would rather get a nice runtime exception than a segfault. If the Fortran Standard requires that to be valid, then a simple solution is to support that in a compiler in the “standard conforming” mode, and then have a compiler option that would give a nice runtime error or a warning.

2 Likes

I tried just now the same test on Thinkpad X1 Carbon gen 8 under Ubuntu 20.04. Both succeeded. The detail is as follows.

Code:

! testalloc.f90

program testalloc                                                                                       
implicit none                                                                                           
integer, allocatable :: x(:)                                                                            
                                                                                                        
print *, 'Test ALLOCATE(X(0))'                                                                          
if (allocated(x)) deallocate (x); allocate (x(0))                                                       
print *, 'Finished with SIZE(X) = ', size(x)                                                            
                                                                                                        
print *, 'Test ALLOCATE(X(-1))'                                                                         
if (allocated(x)) deallocate (x); allocate (x(-1))                                                      
print *, 'Finished with SIZE(X) = ', size(x)                                                            
                                                                                                        
end program testalloc

Result:

$ gfortran --version && gfortran testalloc.f90 && ./a.out 
GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 Test ALLOCATE(X(0))
 Finished with SIZE(X) =            0
 Test ALLOCATE(X(-1))
 Finished with SIZE(X) =            0

$ ifort --version && ifort testalloc.f90 && ./a.out 
ifort (IFORT) 2021.5.0 20211109
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.
 Test ALLOCATE(X(0))
 Finished with SIZE(X) =            0
 Test ALLOCATE(X(-1))
 Finished with SIZE(X) =            0

System:

$ uname -a && lscpu | grep Nom
Linux 5.11.15-051115-generic #202104161034 SMP Fri Apr 16 10:40:30 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Nom de modèle :                         Intel(R) Core(TM) i7-10610U CPU @ 1.80G

I have not checked the standards, but is this particular behavior required? Personally, I agree with @certik that it will be better to raise a certain error here than to produce an empty x silently, the latter of which potentially masks a bug in the code invoking this statement.

However, I do not think there is anything wrong with allocate(x(0)), either logically or mathematically. I would hope that the standard & compilers allow it, the result being an empty rank-1 array. It was the case in my test, but not in @certik 's. Which compiler is behaving correctly?

Thanks.

1 Like

Required or not, that is the behavior stated in the standard, “If the upper bound is less than the lower bound, the extent in that dimension is zero and the array has zero size.”

Note though the text toward the Fortran 2018 revision provides somewhat clearer description of the semantics intended here by the standard.

4 Likes

From 9.7.1.2 Execution of an ALLOCATE statement: “If the upper bound is less than the lower bound, the extent in that dimension is zero and the array has zero size.”

The statement allocate(x(0)) means allocate(x(1:0)) because if omitted, the lower bound defaults to 1. So I think it’s fair to say that the “0” doesn’t mean 0-size, rather the array is 0-sized because 0 is less than the lower bound of 1, and so for any value less than 1.

3 Likes

Try the following, which might not give a segfault error, then set n = -1

   n = 0
   ALLOCATE ( x(n), stat=istat, errmsg=errmes_string )
   if ( istat /= 0 ) then
      write ( *,* )  errmes_string,' : istat =',istat

2 Likes

It seems to work now for me, I don’t get any segfault and it creates an empty array. It segfaults in one Conda environment, but not another, so I think it’s not related to this code, nor to gfortran, but probably to some ABI incompatibility in Conda. Sorry about the confusion.

I see, so the standard behavior is to allocate an empty array if the upper bound is less than the lower bound.

I can’t remember if I ever depended on this feature. It seems I would prefer some kind of a runtime warning. Although an empty array would fail later when I try to index it with out of bounds error, so the bug would manifest anyway, just a little bit later.

Thanks for the useful info @JohnCampbell . Is there a way to check memory availability before initializing allocated arrays, or at least a way to catch the issue without crashing? Is it ok to allocate a large number of large arrays with a single allocate statement, or sequential allocate statements, as long as stat is specified for each statement, but checked at the end of allocation? Is one method preferable to another for allocating a long list of arrays?

1 Like

It all depends on the OS.
I use the attached routine for getting memory usage with Windows 7/10 64bit.
Other OS should provide similar functionality.

report_memory_usage.f90 (5.0 KB)

Memory availability for 64-bit is mainly based on installed physical memory.

1 Like

I was just trying this out, and I get an “unresolved external symbol” for bus_j18 on ifort, is this a gfortran thing?
Thanks, in advance,

No, just a routine to write a large integer using comma for thousands. You can replace use as I18 integer format. I was more trying to identify accessing the memory parameter values.