-frecursive .vs. fmax-stack-var-size .vs. -unlimit -s

Dear all,

I have a question of how to heap arrays or increase stack size in gfortran in the best way.

I know Intel Fortran has -heap-arrays option to heap stuff on heap. So no worry about stack size any more. My personal experience is that, if we compile a program with multiple f90 or f files, this -heap-arrays should only be apply on those files which really need to heap arrays. For files which do not need this flag, just do not add this flag.

Anyway, now for gfortran, I wanted to know what flags can best mimic Intel’s -heap-arrays. After some searching, from a warning from my code by gfortran, I found that frecursive seems kind of similar with Intel’s -heap-arrays, it says,

Consider increasing the ‘-fmax-stack-var-size=’ limit (or use ‘-frecursive’, which implies unlimited ‘-fmax-stack-var-size’) - or change the code to use an ALLOCATABLE array. If the variable is never accessed concurrently, this warning can be ignored, and the variable could also be declared with the SAVE attribute. [-Wsurprising]

see it seems either -fmax-stack-var-size=xxx and -frecursive’ should be fine.
Have anyone used -frecursive’ and is it good?

I am asking this because, like, I define a random number generator function to generate a size(n) random number array, like

function rand(n)
integer :: n
real :: rand(n)
some stuff
return
end function rand

So I use it even for just 1 element random number, so rand(1). I mean I can define rand as allocatable array, but then if I call the function frequently, the frequent allocate and deallocate may cause some performance issues perhaps. So I just define it is rand(n), however if n is big it will casue stackoverflow. So for the file contain this function and those will use this function, it seems I need to use -heap-arrays for those files.

On the other hand, I remember at least both @certik and @shahmoradi recommended

-unlimit -s

But, uhm, how to use this -unlimit -s?
Like, do I put this as a flag somewhere at the Fortran linking stage? Or, just type it in the terminal? Is it possible to apply -unlimit -s to my code only? So that the OS stack limit is still its default value.

Thank you very much in advance!

Perhaps my memory is faulty, but I thought gfortran automatically did “heap arrays”.

On Linux, the stack limit is a shell setting, limited by what was specified when the kernel was built. “ulimit” (not unlimit) is a shell command, not a compile or link switch. Some shells use the syntax “limit stacksize unlimited”. Except that both are misnomers - they set the stack size limit to the kernel-defined max.

What’s worse is that Linux does “lazy allocation”, so that you can ask for more address space but it doesn’t get allocated until you touch it. (I’m not sure this applies to the stack, however.)

2 Likes

Hi @RCquantum, on Linux, you can execute ulimit -s unlimited on a bash command line before running your code. This will theoretically resolve all Stack Overflow problems. On Windows, you won’t have any luck without heap-arrays, as far as I know. If you use gfortran on any platform, set -fmax-stack-var-size=10 (10 bytes stack max) to allocate anything large on the heap. As far as I am aware, -frecursive flag overwrites -fmax-stack-var-size and causes all allocations to happen on the stack. So you should not specify both flags simultaneously. In my experience, using heap allocations reduces runtime performance by about ~5% or so. But the flexibility it offers outweighs the potential performance penalty, in my opinion. I remember Julia developers (or the community, hard to differentiate the two in the old days) bragged about the automatic allocation of all arrays on the heap in Julia years ago when stack overflow was a big deal in Python applications and wrappers. That might have changed by now.

1 Like

Thank you very much Dr. Fortran @sblionel and @shahmoradi !
I see Dr. Fortran @sblionel . Yeah, indeed I notice that it seems gfortran perhaps does not need to manually specify -heap-arrays as it automatically does so. Because the same code in a particular file of mine, for Intel Fortran I have to specify -heap-arrays otherwise it stack overflow, however for gfortran I do not really need a flag for that.
Thank you @shahmoradi , now I am more clear about the usage of ulimit -s unlimited. I just wish the code does not stack overflow, LOL. Thank you for the -fmax-stack-var-size=10 trick (which basically act like Intel’s -heap-arrays) and the explanation of -frecursive.
Yeah, -heap-arrays may have some impact in performance. Usually the performance is small enough. However, for the FLINT ODE solver,

I do notice that if I apply -heap-arrays to all of its files, it decrease its performance by at least a factor of 10. So for the files in FLINT solver I definitely do not add any flags like -heap-arrays. So, since then, I apply -heap-arrays only to the files which contain function/subroutine/array which really need heap arrays.

PS. More info about gfortran flags might be found here,

1 Like

Assuming I understand your description of lazy as deferred physical memory allocation, isn’t this “lazy” a good thing ?
My approach (for Windows gFortran) is to allocate a 500MByte stack size, which is “lazy” in that it is a virtual address that is not used (progressively allocated physical memory) until it is touched.
My understanding is that code + primary stack can not exceed 2 GBytes (4gb?) (ie 32-bit addressing as 64-bit is not yet 64-bit !)

Generally (and more specifically for OpenMP private arrays) it is better to have small arrays on the stack, but once arrays are larger than a memory page (4kbytes), the heap disadvantage disappears. For OpenMP shared arrays, heap arrays are just as efficient as stack arrays, providing ALLOCATE is not a high frequency operation.
For much larger arrays (many gbytes), I use ALLOCATE.

There has been a question about the implementation of -fmax-stack-var-size=xxx, as to if xxx is applied to only local arrays but not automatic arrays. I prefer to select -fstack-arrays (for hopefully local, automatic and private arrays) and then use ALLOCATE to select heap arrays.

You should check the relationship between -fopenmp, -frecursive and -fstack-arrays.

1 Like

Maybe. Let’s say your application does an ALLOCATE of a large array, and checks the status of the allocation, doing some recovery or giving a meaningful message if it fails. With lazy allocation, the ALLOCATE itself will succeed, but the application will get a segfault sometime later when it tries to access the allocated data if it turns out there is insufficient VM available. I would prefer to know earlier that the allocation failed.

2 Likes

I get very few ALLOCATE errors. If an ALLOCATE fails, this is more a logistical problem, that there is insufficient memory installed.
Poor planning, not a code error.

1 Like

So, you would rather get a segfault in some random part of your program rather than a meaningful error at the point of allocation?

2 Likes

I do check for errors with allocate, but the error state is not a sufficient report.
For my applications, the failure of ALLOCATE is effectively when there is insuficient physical memory. This is not reported.
I have pc’s with different amounts of installed memory and if I “forget” and run/test an application on a pc with insufficient memory, I don’t get a segfault, but everything just stops / goes to sleep. A segfault and exit would be a much better outcome. What else can you do?
If there is likely to be insufficient memory, I will run task manager to monitor the memory usage, but this should be estimated before running the program.
(Allocate errors are basically for a 32-bit OS.)

Basically, the error reporting for ALLOCATE is not effective, as it reports the allocation vs available virtual memory. I havn’t used virtual memory for a long time ( 80’s?) You would not combine virtual memory use with OpenMP.

Does anyone use virtual memory for production work ?

Steve,

Thanks for your comments on ALLOCATE errors. I don’t want to be contrary, but in using Allocate on 64-bit OS, I am struck by how infrequently I get an error, while coding the error handling for stat /= 0 can become very extensive, but not very effective.

There is also the most frequent (just about only) problem I experience, which is running a program on the wrong PC which has insufficient installed memory. In this case a program crash would be far preferable to the slow burn of virtual memory. (I wonder how the old disk paging mini’s were so acceptable?)

There are a few usages of ALLOCATE which I find could be improved. I wonder if others find this.

  1. Allocate reports a failure if virtual memory allocation is exceeded. Could there be an option for the limit to be physical memory limit. (assuming there can be a definition of excluding other process allocations) My strategy of defining large virtual stacks could also add to this complexity.

1a) The use of ALLOCATE on 64-bit OS vs 32-bit OS is very different, as with 32-bit, virtual memory is mostly smaller than physical memory, but on 64-bit, virtual is usually not the issue. I think the Fortran Standard ALLOCATE approach is more based on a 32-bit OS environment.

  1. When using allocate for private heap arrays in Openmp, could an OPTION=“new_page” be to allocate these arrays on a new memory page (perhaps for arrays larger than 10 memory pages). This suggestion is based on my assumption that by not sharing private heap arrays for different threads on the same memory page, this could improve performance, ie resulting in less changing memory pages in cache that are shared between threads. I have no definate knowledge this would be effective. No compiler I use provides this option?

OpenMP has introduced some (private) array memory management, although I think they are more for using GPU’s. I have not tried them as yet, as I am not aware of what compilers support OpenMP Ver 5.1 memory options and the memory option alternatives don’t appear to relate to Heap vs Stack memory.

For me, the use of large allocate arrays is a logistical problem, as I need to use the right PC to make sure there is enough physical memory installed. If I do this correctly, there should be no allocate errors.
My approach is to ensure there is enough installed memory for the solution algorithm I am using. When/until that approach fails, I will then need to look for a new solution algorithm.
Perhaps my memory usage experience is too limited/lazy.

I would be interested if others have similar or different views.