Best practice of allocating memory in Fortran?

Allocating memory is a broad topic, but if we consider allocating memory for an integer / real array, there are 3 main types:
Local, where the size is defined by a constant or integer parameter,
Automatic, where the size is an argument to the routine.
Allocatable, which has the most flexibility.

A significant consideration when allocating memory is what memory pool is being used, either the stack or the heap.
By default local and automatic arrays are allocated on the stack, while allocate arrays are on the heap.

Stack Overflow is the curse of large arrays and always needs to be considered.

The stack is basically fixed in size and “small” (typically some MBytes, but I usually set the stack in the linker to 500 Mbytes, as a virtual memory pool). There is a stack limit, which is ridiculous for 64bit programs, I think code + stack < 2GB or 4GB.
The heap is extendable and can be the physically installed memory or extend to the virtual memory address space. Remember that when you ALLOCATE an array, it is not provided with physical memory pages until each virtual page is defined/used.

Scope: Local and Automatic arrays are automatically “deallocated” when they go out of scope (exit the routine). Allocate arrays will also be automatically “deallocated”, unless they have global scope via a module. Allocate arrays in a module do not go out of scope.
For large arrays, the easiest option is ALLOCATE onto the heap. Any array larger than the stack should be forced onto the heap via ALLOCATE. This is a robust way of defining large arrays.
The use of stack or heap for local/automatic arrays can be controlled by compiler options.
-fstack-arrays will shift local/automatic arrays onto the stack
-fno-automatic will shift local/automatic arrays onto the heap
-fmax-stack-var-size=n will shift large local/automatic arrays onto the stack (but what value for n?)
-fopenmp implies –frecursive implies -fstack-arrays

Note ALLOCATE does not imply physical memory allocation. For example, you can ALLOCATE an array up to the size of your virtual memory limit, then use it as a sparse array, using physical memory pages only as the space is addressed. However, “array = 0” will demand all physical pages, so in this case initialisation needs to be done carefully. Using/demanding more memory than physically installed is also a bad idea.

Efficiency is another issue. It is best to place many small arrays on the stack, but larger arrays (multiple memory pages) are just as efficient in the heap.
For most OpenMP, each thread has it’s own stack so stack arrays do not have memory page coherence problems that heap arrays might generate. For OpenMP, I define 500Mbyte stack for each thread, but memory pages are only allocated to the used portion of each stack. Works for me.
If not using OpenMP, I try to place all large arrays on the heap, which is a more robust approach.
The other problem with large arrays, especially with OpenMP and many threads, is you can easily overflow L3 cache and have a memory <> cache bottleneck, where the combined memory access rate of all threads exceeds the memory bandwidth.

The worst outcome is to place large local/automatic arrays on the stack and get a stack overflow when you have forgotten the code development phase of your project.

I hope these memory allocation issues are of interest.

8 Likes