How large can an automatic array (on the heap) be?

Consider a program involving a large real automatic array that is saved on the heap. Suppose that the RAM of the machine is R and the operating system is OS.

Question: What is the largest number of elements this array can have?

As an illustration, I made some tests with R= 32 GB and OS = Ubuntu 22.04.

Code:

module s_mod
implicit none
private
public :: sfunc

contains

function sfunc(n) result(s)
! The purpose of this stupid function is to test a large automatic array.
integer, intent(in) :: n
real :: s
real :: x(n, n)
x = 1.0
s = sum(x)
end function sfunc

end module s_mod

program test_mem
use :: s_mod, only:sfunc
implicit none
integer :: i
integer, parameter :: n = 31
do i = 1, n
    write (*, *) 2**i, sfunc(2**i)
end do
end program test_mem

System and hardware:

$ uname -a && lsmem 
Linux zX10 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
RANGE                                  SIZE  STATE REMOVABLE  BLOCK
0x0000000000000000-0x0000000097ffffff  2.4G online       yes   0-18
0x0000000100000000-0x000000085fffffff 29.5G online       yes 32-267

Memory block size:       128M
Total online memory:    31.9G

Results:

  1. gfortran:
$ gfrotran --version && gfortran test_auto.f90 && ./a.out

GNU Fortran (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

           2   4.00000000    
           4   16.0000000    
           8   64.0000000    
          16   256.000000    
          32   1024.00000    
          64   4096.00000    
         128   16384.0000    
         256   65536.0000    
         512   262144.000    
        1024   1048576.00    
        2048   4194304.00    
        4096   16777216.0    
        8192   16777216.0    
       16384   16777216.0    
       32768   16777216.0    
       65536   16777216.0    
      131072   16777216.0    
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f4aabd60ad0 in ???
#1  0x7f4aabd5fc35 in ???
#2  0x7f4aabb5751f in ???
	at ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
#3  0x55b94045f25a in ???
#4  0x55b94045f3b4 in ???
#5  0x55b94045f42b in ???
#6  0x7f4aabb3ed8f in __libc_start_call_main
	at ../sysdeps/nptl/libc_start_call_main.h:58
#7  0x7f4aabb3ee3f in __libc_start_main_impl
	at ../csu/libc-start.c:392
#8  0x55b94045f0e4 in ???
#9  0xffffffffffffffff in ???
Segmentation fault (core dumped)
  1. ifort
$ ifort --version && ifort -heap-arrays test_auto.f90 && ./a.out

ifort (IFORT) 2021.8.0 20221119
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

           2   4.000000    
           4   16.00000    
           8   64.00000    
          16   256.0000    
          32   1024.000    
          64   4096.000    
         128   16384.00    
         256   65536.00    
         512   262144.0    
        1024   1048576.    
        2048   4194304.    
        4096  1.6777216E+07
        8192  6.7108864E+07
       16384  1.3421773E+08
       32768  1.3421773E+08
       65536  1.3421773E+08
      131072  1.3421773E+08
forrtl: severe (41): insufficient virtual memory
Image              PC                Routine            Line        Source             
a.out              000000000040423B  Unknown               Unknown  Unknown
a.out              000000000040415D  Unknown               Unknown  Unknown
libc.so.6          00007F03C34CED90  Unknown               Unknown  Unknown
libc.so.6          00007F03C34CEE40  __libc_start_main     Unknown  Unknown
a.out              0000000000404075  Unknown               Unknown  Unknown

  1. ifx:
$ ifx --version && ifx -heap-arrays test_auto.f90 && ./a.out

ifx (IFORT) 2023.0.0 20221201
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

           2   4.000000    
           4   16.00000    
           8   64.00000    
          16   256.0000    
          32   1024.000    
          64   4096.000    
         128   16384.00    
         256   65536.00    
         512   262144.0    
        1024   1048576.    
        2048   4194304.    
        4096  1.6777216E+07
        8192  1.6777216E+07
       16384  1.6777216E+07
       32768  1.6777216E+07
       65536  1.6777216E+07
      131072  1.6777216E+07
forrtl: severe (41): insufficient virtual memory
Image              PC                Routine            Line        Source             
a.out              0000000000405405  Unknown               Unknown  Unknown
a.out              000000000040515D  Unknown               Unknown  Unknown
libc.so.6          00007F86C1A0DD90  Unknown               Unknown  Unknown
libc.so.6          00007F86C1A0DE40  __libc_start_main     Unknown  Unknown
a.out              0000000000405075  Unknown               Unknown  Unknown
  1. nagfor
$ nagfor test_auto.f90 && ./a.out
 
NAG Fortran Compiler Release 7.0(Yurakucho) Build 7076
[NAG Fortran Compiler normal termination]
 2   4.0000000
 4  16.0000000
 8  64.0000000
 16   2.5600000E+02
 32   1.0240000E+03
 64   4.0960000E+03
 128   1.6384000E+04
 256   6.5536000E+04
 512   2.6214400E+05
 1024   1.0485760E+06
 2048   4.1943040E+06
 4096   1.6777216E+07
 8192   1.6777216E+07
 16384   1.6777216E+07
 32768   1.6777216E+07
 65536   1.6777216E+07
 131072   1.6777216E+07
Runtime Error: Cannot get storage for variable - out of memory
Program terminated by fatal error
Aborted (core dumped)
  1. flang
flang --version && flang test_auto.f90 && ./a.out

lang version 15.0.3
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/zaikunzhang/local/flang/bin
            2    4.000000    
            4    16.00000    
            8    64.00000    
           16    256.0000    
           32    1024.000    
           64    4096.000    
          128    16384.00    
          256    65536.00    
          512    262144.0    
         1024    1048576.    
         2048    4194304.    
         4096   1.6777216E+07
         8192   1.6777216E+07
        16384   1.6777216E+07
        32768   1.6777216E+07
        65536   1.6777216E+07
       131072   1.6777216E+07
0: ALLOCATE: 274877906944 bytes requested; not enough memory
  1. nvfortran
$  nvfortran --version && nvfortran test_auto.f90 && ./a.out

nvfortran 22.11-0 64-bit target on x86-64 Linux -tp haswell 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
            2    4.000000    
            4    16.00000    
            8    64.00000    
           16    256.0000    
           32    1024.000    
           64    4096.000    
          128    16384.00    
          256    65536.00    
          512    262144.0    
         1024    1048576.    
         2048    4194304.    
         4096   1.6777216E+07
         8192   1.6777216E+07
        16384   1.6777216E+07
        32768   1.6777216E+07
        65536   1.6777216E+07
       131072   1.6777216E+07
0: ALLOCATE: 274877906944 bytes requested; not enough memory

It seems that the program does not crash until it exhausts all the 32GB memory and goes even beyond that (probably by using virtual memory).

Since the array is on the heap, I guess it does not behave very differently compared with allocatable arrays.

Let me repeat my question: What is the largest amount of memory an automatic array can take if it is saved on the heap? (Whether we should use automatic or allocatable arrays is a different question.)

Thanks.

2 Likes

Your findings are very interesting, thanks! I thought that if you have large arrays it is better to define them as allocatable. With automatic arrays you run into memory problem because they are saved on the stack and the stack has limited size. So I’m quite surprised of your results

1 Like

My experience is, if you use -heap-arrays, there should be no limitations of the array size. Depending on the OS, (not very sure) perhaps if the array size is beyond physics RAM, it may even use virtual memory (the space of your SSD or HDD).
-heap-arrays is good to fix stack overflow issues, if there are fixed and large size arrays in your code, like A(99999999) etc.
But as @aledinola pointed, if possible, using allocatable array should be a better choice. So that the array size is a variable, and you can use just the right amount of RAM for the problem. So that there is no need to define a big fixed-size array like A(99999999), and use -heap-arrays to bypass the stack overflow issues.

1 Like

There are (at least) two things affecting how large an array you can allocate. The first is total available virtual memory, which is more constrained by the swapfile/pagefile size than RAM. The second is fragmentation of the virtual memory pool - even if you deallocate the smaller sizes, there doesn’t tend to be consolidation of the fragments and you can find that you can’t allocate a big thing even if the total available VM is sufficient.

4 Likes

Steve,

I am not sure where fragmentation comes into this, as a heap allocated array will be given a virtual memory address and then allocated physical memory pages as the parts of the array are used ?

By default for most OS, local or automatic arrays will go onto the stack, while allocated arrays will go onto the heap. For OS I am familiar with, the stack is limited to about a gigabyte. This can vary!
You can change to the heap with compiler options, but if you are targeting a large automatic array, it will probably exceed the size of the stack, requiring some compiler option intervention to place it on the heap. Then as Steve points out, the heap is limited by the virtual memory address size available from physical memory and disk paging virtual memory size.

This limit can be extended in unusual circumstances if only parts of the sparse array are addressed, as only when the array is addressed/used does physical memory be allocated. For example, if an array of say 100 GBytes is defined, but only 100 megabytes is used (carefully), only memory for this 100 megabytes will be allocated, as 4kbyte pages, without allocating memory assets for the other untouched 99.9 gbytes (assets either physical or virtual on the paging disk).

1 Like

The fragmentation is that of the pool of virtual memory that the memory allocator handles. Typically, the first allocation causes a large-ish block of VM to be requested from the OS. The allocator then uses some of this to track its size, location of other blocks, and which pieces are free or used. Then subsequent calls find the first/best free part of that block, and if none, ask the OS for another block. Most OSes don’t let you request small or odd-sized blocks, and this reduces the number of calls.

When you free an allocation, the allocator library updates its lists to mark that segment as free. It may or may not look to see if an adjacent segment is also free and merge them. But even if it does that, it’s likely that a later request for an even bigger chunk will require asking the OS for a new, larger block. The previous blocks are held onto for potential future allocations, they’re not returned to the OS.

Some parts of the allocated memory are used to keep track of the segments and blocks. These will interfere with reusing all that space for future allocations. A pattern of increasing size allocate/deallocate requests will result in fragmentation of the available VM, with no contiguous space available for a bigger request.

1 Like

I don’t know exactly how to monitor this, but I have suspected that many compilers never really do any garbage collection to combine the deallocated arrays together and defragment the heap. Instead, they just keep allocating new memory blocks from the 64-bit virtual address space. This wasn’t really practical with 32-bit addressing. There heap did need to be cleaned up regularly to stay usable. But 64-bit addresses are unlikely to be exhausted now, or any time soon, so it allows the OS programmers to be lazy. One can experiment a little with this using c_loc() to look for addresses that are reused, but it really isn’t definitive because the programmer can’t directly control any of the garbage collection processes or examine the availability of stack or heap.

1 Like

Steve,

Thanks for the clarification.

I have always been under the impression that the virtual memory address space did not have to be sequential, so if there is fragmentation, there should be no problem. I also assumed that the virtual memory limit was not based on max_address - min_address, but on memory pages allocated, ie based on the memory pool list size.

There is a limit on the virtual memory available, based on physical memory plus the paging disk allocation, but must the virtual memory pool be sequential ? I have allocated some large arrays and also 0.5 GBytes for each thread stack, which I thought did not appear to be sequential in memory.
I thought there was some heap cleanup once allocated arrays were released, but this could vary with OS.

I would be appreciate if you could expand on your comment.

John

2 Likes

fragmentation is still a problem because virtual addresses don’t fix the fact that touching 1 byte of a page steals the whole page.

1 Like

I would call this a feature, rather than a problem, as only the memory pages that are touched are allocated to physical memory and go to the virtual memory pool.

It is my understanding that all other memory pages associated with a large array that are not touched are not counted in the used physical memory or virtual memory limit. This is not a typical array use, but can help with a very large virtual array. A practicality of this is that you want your active information to be in physical memory and never want to rely on virtual memory for storage, which imposes a huge performance hit.
Hard to believe we were happy with mini-computers of the 70’s and 80’s that relied on using paging memory, but were 1,000 x slower ! Paging size is important as physical memory + paging memory size can limit the virtual array size when initially ALLOCATEing. ( lots of practical considerations for Fortran users, far removed from the standard concept of STORAGE UNIT! )

I have used this approach to map information for a long marine channel onto a 2d grid of (virtual) 100x100 pannels, where each pannel is a 1000x1000m grid ( 10,000 pannels with about only 150 in use, so only 150 x 1000x1000 points of information demanding memory ) This reduced memory need can be verified in task manager ( I use win x64 )

You certainly do not use “large_virtual_array = 0”, as this will demand all memory pages be allocated.

True, from the OS perspective, allocated regions of VM don’t have to be contiguous. My earlier comment regarded the list of available memory blocks kept by the allocation routine used by the language (malloc, for example.) The OS allocates memory in much larger chunks than you’d want for program use, so the language allocator has to keep track of which parts of a chunk are in use and which are not. Once the language allocator asks the OS for a chunk, it is never released until the program exits.