Stack overflow, heap:arrays and Fortran array functions

Correct, I didn’t read well sorry

The compiler is allowed to allocate b(:) on the stack like this because b(:) is a local array, and it must disappear upon subroutine return the same way that an automatic array like a(:) must disappear. There must be some threshold size where it switches to the low heap addresses for both arrays.

I next modified your code as real, allocatable, save :: b(:), and also adding the deallocate statement before return. I was expecting b(:) to always have a low heap address with that change. However, I get the same result as before. I guess the compiler looks ahead to see the deallocate statement and it doesn’t let me trick it.

@aledinola , there is a compromise on most operating systems between larger stack sizes and the use of heap memory, that is especially the case on Windows OS where users are generally better off writing programs that can work well with the default stack size which is not large.

Is it possible for you to show a small but complete example of “Setting /heap:arrays0 does, at least in my setup (ifort on Windows)”?

1 Like

The question is, what does tell you that a and possibly b are on the stack for small sizes? The two scalars are clearly allocated 4 bytes apart, but on the first 3 lines a and t are allocated 140732877400696 - 140375574461632 = 357302939064 bytes (357GB) apart. I though that all addresses (even if virtual addresses) in the stack should be contiguous: maybe I’m wrong?

2 Likes

@FortranFan the code is part of a project with many modules and subprograms, so I think it’s not so easy to come up with a MWE that reproduces the issue. But as soon as I have time I will try to provide an example, if I manage.

The code uses OpenMP and without the flag /heap-arrays0 I get a stack overflow. Then I added /heap-arrays0 to the entire project and it runs but very slow. A co-worker of mine on this project has Mac and she does not have any issue with memory.

How did I solve the problem? I compiled only some files with /heap-arrays0 and not the entire project. Basically what @CRquantum explained. In this way, the code runs without stack overflow and is way faster.

1 Like

Does she use the Intel compiler as well?

The ultimate way is to modify the code in order to avoid the big allocations on the stack.

1 Like

@PierU

  1. she uses ifort as well, same version

  2. yes, indeed I also changed a bit the code to make it more robust to potential stack overflows. But this means that I had to remove for example “where” statements because they create large automatic arrays for the mask (I guess) and replace them with less concise do loops with nested if conditions (now I learnt that there are better alternatives e.g. merge, as @FortranFan kindly suggested me above)

By default, using Gfortran (and ifort), allocatable arrays (such as b) go on the heap, while automatic arrays (such as a) go on the stack.

Using Gfortran, if selecting using “-fstack-arrays” for large local arrays, it is necessary to also increase the stack size. My link .bat file to achieve this is:

set tce=load_gf.tce
set load=gfortran

set stack_options=-Wl,-stack,536870912,-Map=%program%.map

del %program%.map
del %program%.exe

now >>%tce%

%load% @load_gf.txt -fopenmp -fstack-arrays %stack_options% >>%tce%  2>&1

dir %program%.* /od >>%tce%

notepad %tce%

My file load_gf.txt lists all .o files (in multiple directories) and finally -o program.exe

For a large 64-bit Fortran multi-threaded program, I typically set the stack to 0.5 GBytes (per thread). This requires a large virtual memory address space, but has no direct impact on the physical memory requirement and no significant performance problem.

I think the default stack size for 64-bit is more than 1 MBytes. For contemporary OS, such a small stack provision is wrong.

When you run the above code with your stack_options, what is the output? Am I misinterpreting from the addresses what is a heap address and what is a stack address?

Ron,

I have listed a modified program to give more information on the stack and heap and build using Gfortran 64-bit Ver 11.1 on Win 10.
It demonstrates the use of a 500 Mbyte stack for each thread for variables s,t & a but shared heap among all threads for b.
The memory addresses are reported as page no : byte number, which can be easier to understand…
Note for Gfortran : win-64 implementation, each thread stack defaults to the same stack size as the primary thread. (this may not be teh same default on other OS)
I do have other more detailed stack reporting routines based on the winapi routines “GetCurrentThreadStackLimits” and “GlobalMemoryStatusEx”, but don’t know the equivalent for other OS

The test is carried out on a Ryzen 5900X configured for 12 threads and 64 GBytes of memory so there is no problem with physical or virtual memory addresses.

program testack
implicit none

integer :: i

 write (*,*) &
 'id th           size    loc_s : stack     loc_t : stack       loc_a : stack         loc_b : heap'

!$omp parallel do private (i)
do i = 0, 20
   call testloc(i)
end do
!$omp end parallel do

CONTAINS

   recursive subroutine testloc(i)
   integer, intent(in) :: i
   integer :: thread
   real :: s, a(2**i), t
   real, allocatable :: b(:)
   integer*4, external :: omp_get_thread_num
   character*18, external ::mem_j18

   thread = omp_get_thread_num ()
   a = 0.0
   allocate( b, source=a )
   write (*,11) i, thread, storage_size(a) * size(a) / 8, mem_j18(loc(s)), mem_j18(loc(t)), mem_j18(loc(a)), mem_j18(loc(b))
 11 format ( i3, i3, i15, 4a19 )

   end subroutine

end program

    character*18 function mem_j18 (value)
!
!   Convert a memory address into a character string +n,nnn,nnn,nnn:n,nnn
!
      implicit none
      integer*8 :: value
      integer*8 :: page, offset
      character :: cp*18, co*18
      character, external :: bus_j18*18
!
      page    = value / 4096
      offset  = value - page*4096
      cp      = bus_j18 (page)
      co      = bus_j18 (offset)
      mem_j18 = cp(7:18) // ':' // co(14:18)
    end function mem_j18
    
    character*18 function bus_j18 (value)
!
!   Convert a long integer into a character string +n,nnn,nnn,nnn,nnn
!
      implicit none
      integer*8 :: value
      character :: number*26
      character, external :: bus_j26*26

      number  = bus_j26 (value)
      call fix_power_string (number, 8)
      bus_j18 = number(9:)
    end function bus_j18

    character*26 function bus_j26 (value)
!
!   Convert a long integer into a character string +n,nnn,nnn,nnn,nnn,nnn,nnn
!    10^19 = 2^64
!
      implicit none
      integer*8 :: value
!
      character :: number*26
      integer*8 :: ten = 10, k
      integer*4 :: n, digit, zero, minus, comma
!
      if (value >= 0) then
        k     = value
        minus = 0
      else
        k     = -value
        minus = 1
      end if

      zero   = ichar ('0')
      n      = len(number)      ! last character
      comma  = mod(n+1,4)       ! comma position
      number = ' '
      do
         digit = mod (k,ten)
         number(n:n) = char (digit+zero)
         n = n-1
         k = k/ten
         if (k==0) exit
         if (n<=minus) then ! overflow
           number = repeat ('#',len(number))
           exit
         end if
         if (mod(n,4) == comma) then
            number(n:n) = ','
            n = n-1
         end if
      end do
      if (value < 0) then
        if ( n <  1 ) then
          number(1:1) = '#'
        else
          number(n:n) = '-'
        end if
      end if
!
      bus_j26 = number
!
    end function bus_j26

    subroutine fix_power_string (number, k)
!
!  fix that string fits in number(k+1:)
!
      implicit none
      character number*(*)
      integer   :: k, n

      if ( number(k:k) == ' ' ) return
      n = len(number)
      do
        if ( number(k:k) == ' ' ) exit
        if ( number(n-3:n-3) == ',' ) then
          number = '   ' // number(:n-4) // 'k'
        else if ( number(n:n) == 'k' ) then
          number = '    ' // number(:n-5) // 'm'
        else if ( number(n:n) == 'm' ) then
          number = '    ' // number(:n-5) // 'g'
        else if ( number(n:n) == 'g' ) then
          number = '    ' // number(:n-5) // 't'
        else
          number = repeat ('#',n)
          number(1:k) = ' '
          exit
        end if
      end do
    end subroutine fix_power_string
set program=%1

set tce=load_gf.tce
set load=gfortran

set options=-g -fimplicit-none -O2 -march=native -ffast-math -fopenmp -fstack-arrays -o %program%.exe
set stack_options=-Wl,-stack,536870912,-Map=%program%.map

del %program%.o
del %program%.map
del %program%.exe

echo #### Start of New Test ##################################################################################### >> %tce%
now >> %tce%

%load% %program%.f90 %options% %stack_options% >>%tce%  2>&1

dir %program%.* /od >> %tce%

echo ================================================================ >> %tce%
now   >> %tce%
set g >> %tce%
set options >> %tce%

%program% >>%tce%

notepad %tce%
#### Start of New Test ##################################################################################### 
  It is now Friday, 20 October 2023 at 19:33:38.443
 Volume in drive C has no label.
 Volume Serial Number is 88FF-836A

 Directory of C:\temp\parallel

20/10/2023  07:33 PM             3,663 teststack.f90
20/10/2023  07:33 PM            15,208 teststack.obj
20/10/2023  07:33 PM         3,473,432 teststack.exe
20/10/2023  07:33 PM           891,249 teststack.map
               4 File(s)      4,383,552 bytes
               0 Dir(s)  232,296,456,192 bytes free
================================================================ 
  It is now Friday, 20 October 2023 at 19:33:38.702
gcc.ver=11.1.0
gcc_dir=C:\Program Files (x86)\gcc_eq\gcc_11.1.0
gcc_path=C:\Program Files (x86)\gcc_eq\gcc_11.1.0\bin;C:\Program Files (x86)\gcc_eq\gcc_11.1.0\libexec\gcc\x86_64-w64-mingw32\11.1.0
options=-g -fimplicit-none -O2 -march=native -ffast-math -fopenmp -fstack-arrays -o teststack.exe
 id th           size    loc_s : stack     loc_t : stack       loc_a : stack         loc_b : heap
  6  3            256      786,415:2,956      786,415:2,960      786,415:2,688      264,348:  160
  4  2             64      655,343:2,956      655,343:2,960      655,343:2,880      264,348:  432
  2  1             16      395,423:2,956      395,423:2,960      395,423:2,928      264,348:  512
  0  0              4      132,911:2,748      132,911:2,752      132,911:2,720      264,348:  544
  8  4           1024      917,487:2,956      917,487:2,960      917,487:1,920      264,348:  576
 10  5           4096    1,048,559:2,956    1,048,559:2,960    1,048,558:2,944      264,349:3,776
  7  3            512      786,415:2,956      786,415:2,960      786,415:2,432      395,424:  496
  5  2            128      655,343:2,956      655,343:2,960      655,343:2,816      264,350:3,792
  3  1             32      395,423:2,956      395,423:2,960      395,423:2,912      264,350:3,936
 12  6          16384    1,179,631:2,956    1,179,631:2,960    1,179,627:2,944      395,424:2,128
  1  0              8      132,911:2,748      132,911:2,752      132,911:2,720      395,428:2,400
  9  4           2048      917,487:2,956      917,487:2,960      917,487:  896      395,429:2,688
 11  5           8192    1,048,559:2,956    1,048,559:2,960    1,048,557:2,944      395,430:  656
 14  7          65536    1,310,703:2,956    1,310,703:2,960    1,310,687:2,944      395,432:  672
 13  6          32768    1,179,631:2,956    1,179,631:2,960    1,179,623:2,944      395,448:  688
 15  7         131072    1,310,703:2,956    1,310,703:2,960    1,310,671:2,944      395,429:2,688
 16  8         262144    1,441,775:2,956    1,441,775:2,960    1,441,711:2,944      395,429:2,688
 17  8         524288    1,441,775:2,956    1,441,775:2,960    1,441,647:2,944      395,429:2,688
 18  9        1048576    1,572,847:2,956    1,572,847:2,960    1,572,591:2,944      395,681:   64
 19 10        2097152    1,703,919:2,956    1,703,919:2,960    1,703,407:2,944      395,694:   64
 20 11        4194304    1,834,991:2,956    1,834,991:2,960    1,833,967:2,944      395,694:   64