Allocate a very large(>900GB) array with -qopenmp result in segmentation fault

Hello everyone in Fortran-lang community. I met a problem allocating a large array and now ask for your help.

So I was testing a HPC node with 2TB memory, and I decided to use the following code memtest.f to test its memory:

program memtest
        integer(8) :: i, n, sizen, statt
        real(8), allocatable:: array(:)
        i=0_8
        n=0_8
        statt=0_8
        write(*,*) "Memory testing started. status is ",statt
        do while(statt .eq. 0_8)
          i = i+10
          n = i*1_8*1024_8**3_8/8_8
          sizen = n*8_8/1024_8**3_8
          allocate(array(n),stat=statt)
          if(.not. allocated(array)) then
            write(*,*) 
     &"Error! 1D Array of size ",sizen,"GB cannot be allocated."
            exit
          else
            write(*,*) 
     &"Successfully allocated 1D array of size ",sizen ," GB"
            deallocate(array, stat=statt)
          end if
        enddo
end program

and compile with

ifort memtest.f -o memtest.exe

execute memtest.exethe output is:

...(truncated)
 Successfully allocated 1D array of size                   1930  GB
 Successfully allocated 1D array of size                   1940  GB
 Successfully allocated 1D array of size                   1950  GB
 Successfully allocated 1D array of size                   1960  GB
 Error! 1D Array of size                   1970 GB cannot be allocated.

so that’s expected.

However, with another compilation method, which is used in Makefiles,

ifort -c memtest.f 
ifort memtest.o -o memtest.exe

I got

...(truncated)
 Successfully allocated 1D array of size                    930  GB
 Successfully allocated 1D array of size                    940  GB
 Successfully allocated 1D array of size                    950  GB
forrtl: severe (174): SIGSEGV, segmentation fault occurred

I’m new to Fortran and static compilation, so I really don’t know how to use Makefiles, etc. Does anybody know the difference between:

  • directly compile a file,
  • compile to objects and link them later

and how to avoid this pitfall?

EDIT:
After testing I found that both compilation methods is OK(Don’t know how did it not work).
But with -qopenmp flag, the error occurs.

It is reproducible? Do every run stop at the same allocated size? Are you running alone on that node, or are there possibly other users who run their own stuff?

I can’t see obvious reason why linking separately would make a difference.

The answer is:

  • It does reproduce, at least from what i tried
  • it depends on the granularity of the test, and I only tried 10GB stepsize
  • I believe the HPC is exclusive usage: one node can only run one task at once.

Could you try this modified version?

program memtest
        use iso_fortran_env
        integer(int64) :: i, n
        real(real64), allocatable:: array(:)
        i=0
        do 
          i = i + 10
          n = i * 1024_8**3 / 8
          write(*,"(A,I5,A)",advance='no') "Trying to allocate", i, "GB"                  
          allocate( array(n) )
          write(*,"(A)") " --> OK"
          deallocate( array )
        end do
end program

Note also that most of the OS (I think Windows too) do what is called “lazy allocation”: they let you allocate more than the available memory, and this does not matter as long as what you are actually using fits in the available memory. This is the difference between “virtual memory” (what is allocated) and “resident memory” (what is actually used). So, your test may not be pertinent.

If its an HPC node, I presume its running Linux or some other variant of Unix. Remember in addition to the memory you are allocating directly, there is also memory required by the OS along with memory buffers needed for IO so there is less of the 2TB address space available for your program memory than you think. The only other time I’ve seen anything like this was many years ago on a Cray system during the first generation of Cray’s XC systems where there was a weird problem with the way the OS handled the IO buffers created during a run. In some instances doing an IO operation in between allocations would effectively create a barrier to allocating any memory after you did a WRITE that prevented any subsequent ALLOCATEs to work even if there was memory available. This was around 15 or so years ago now so I doubt thats your problem. However, you might try commenting out the WRITE statement you have before your memory allocation loop. Again I doubt it will have any effect but you never know. The only thing I can think of as to why separate compilation/linking would cause this is the way ifort is handleing system shared libraries (.so files aka DLL in Windows world)

I just find something interesting:
the above code and compilation scheme DID work, but in my use case, when compiled with -qopenmp flag, the segfault occurs at 950GB. Otherwise I can claim the whole 2TB memory.

EDIT:
I have edited the post to reflect this info.

You didn’t mention the -qopenmp flag in your initial post, but rather just:

Did you also include the -qopenmp flag during the link step?

ifort requires some flags to appear both in the compile step and in the link step — -parallel and -fpie are other examples of such kind of flags.

In addition when you say an “HPC node” it is not obvious if you are allocating a node via a scheduler like LSF, Slurm, … and whether your platform is hyperthreaded and whether any compiler option defaults are modified, what process limits are in effect … . To eliminate a
few of those questions if you run the following with the openmp switches, where a few
lines have been added to add some descriptive text it would eliminate a few of those dusty corners:

program memtest
use,intrinsic :: iso_fortran_env, only : int64, real64, compiler_version, compiler_options
use omp_lib, only : omp_get_thread_num
integer(int64) :: i, n
integer :: stat
real(real64), allocatable:: array(:)
character(len=255) ::  errmsg
   i=0
   print *, "Hello from process: ", omp_get_thread_num()
   print '(4a)', 'This file was compiled by ', compiler_version(), &
                 'using the options ',        compiler_options()
   call execute_command_line('bash -c "free -h;ulimit -a"')
   do
      i = i + 10
      n = i * 1024_8**3 / 8
      write(*,"(A,I5,A)",advance='no') "Trying to allocate", i, "GB"
      if(allocated(array)) deallocate( array )
      allocate( array(n), stat=stat,errmsg=errmsg)
      if(stat.eq.0)then
         write(*,"(A)") " --> OK"
      else
         write(*,"(A)") " --> FAILED:"//trim(errmsg)
         stop
      endif
   enddo
end program

include -qopenmp in both steps returns the same segfault.

Wow, I never there are print and execute_command_line intrinsic functions, many new things to learn.

Anyway, I finally get access to the node now. The job is submitted in Slurm. Here is the output from your script.
I compiled with:

ifort -qopenmp memtest_urbanjost.f90 -o memtest_urbanjost.exe

output is:

 Hello from process:            0
This file was compiled by Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.5.0 Build 20211109_000000using the options -qopenmp -o memtest_urbanjost.exe
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.9G         10G        1.9T
Swap:            0B          0B          0B
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 8205077
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Trying to allocate   10GB --> OK
Trying to allocate   20GB --> OK
Trying to allocate   30GB --> OK
Trying to allocate   40GB --> OK
Trying to allocate   50GB --> OK
Trying to allocate   60GB --> OK
Trying to allocate   70GB --> OK
Trying to allocate   80GB --> OK
Trying to allocate   90GB --> OK
Trying to allocate  100GB --> OK
Trying to allocate  110GB --> OK
Trying to allocate  120GB --> OK
Trying to allocate  130GB --> OK
Trying to allocate  140GB --> OK
Trying to allocate  150GB --> OK
Trying to allocate  160GB --> OK
Trying to allocate  170GB --> OK
Trying to allocate  180GB --> OK
Trying to allocate  190GB --> OK
Trying to allocate  200GB --> OK
Trying to allocate  210GB --> OK
Trying to allocate  220GB --> OK
Trying to allocate  230GB --> OK
Trying to allocate  240GB --> OK
Trying to allocate  250GB --> OK
Trying to allocate  260GB --> OK
Trying to allocate  270GB --> OK
Trying to allocate  280GB --> OK
Trying to allocate  290GB --> OK
Trying to allocate  300GB --> OK
Trying to allocate  310GB --> OK
Trying to allocate  320GB --> OK
Trying to allocate  330GB --> OK
Trying to allocate  340GB --> OK
Trying to allocate  350GB --> OK
Trying to allocate  360GB --> OK
Trying to allocate  370GB --> OK
Trying to allocate  380GB --> OK
Trying to allocate  390GB --> OK
Trying to allocate  400GB --> OK
Trying to allocate  410GB --> OK
Trying to allocate  420GB --> OK
Trying to allocate  430GB --> OK
Trying to allocate  440GB --> OK
Trying to allocate  450GB --> OK
Trying to allocate  460GB --> OK
Trying to allocate  470GB --> OK
Trying to allocate  480GB --> OK
Trying to allocate  490GB --> OK
Trying to allocate  500GB --> OK
Trying to allocate  510GB --> OK
Trying to allocate  520GB --> OK
Trying to allocate  530GB --> OK
Trying to allocate  540GB --> OK
Trying to allocate  550GB --> OK
Trying to allocate  560GB --> OK
Trying to allocate  570GB --> OK
Trying to allocate  580GB --> OK
Trying to allocate  590GB --> OK
Trying to allocate  600GB --> OK
Trying to allocate  610GB --> OK
Trying to allocate  620GB --> OK
Trying to allocate  630GB --> OK
Trying to allocate  640GB --> OK
Trying to allocate  650GB --> OK
Trying to allocate  660GB --> OK
Trying to allocate  670GB --> OK
Trying to allocate  680GB --> OK
Trying to allocate  690GB --> OK
Trying to allocate  700GB --> OK
Trying to allocate  710GB --> OK
Trying to allocate  720GB --> OK
Trying to allocate  730GB --> OK
Trying to allocate  740GB --> OK
Trying to allocate  750GB --> OK
Trying to allocate  760GB --> OK
Trying to allocate  770GB --> OK
Trying to allocate  780GB --> OK
Trying to allocate  790GB --> OK
Trying to allocate  800GB --> OK
Trying to allocate  810GB --> OK
Trying to allocate  820GB --> OK
Trying to allocate  830GB --> OK
Trying to allocate  840GB --> OK
Trying to allocate  850GB --> OK
Trying to allocate  860GB --> OK
Trying to allocate  870GB --> OK
Trying to allocate  880GB --> OK
Trying to allocate  890GB --> OK
Trying to allocate  900GB --> OK
Trying to allocate  910GB --> OK
Trying to allocate  920GB --> OK
Trying to allocate  930GB --> OK
Trying to allocate  940GB --> OK
Trying to allocate  950GB --> OK
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
memtest_urbanjost  000000000040B09A  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B2F2952D630  Unknown               Unknown  Unknown
libiomp5.so        00002B2F2924FFEE  Unknown               Unknown  Unknown
libiomp5.so        00002B2F2924D7C6  Unknown               Unknown  Unknown
libiomp5.so        00002B2F2924D2F8  Unknown               Unknown  Unknown
libiomp5.so        00002B2F29247993  Unknown               Unknown  Unknown
libiomp5.so        00002B2F2924834D  Unknown               Unknown  Unknown
memtest_urbanjost  0000000000423A11  Unknown               Unknown  Unknown
memtest_urbanjost  00000000004043A6  Unknown               Unknown  Unknown
memtest_urbanjost  0000000000403FA2  Unknown               Unknown  Unknown
libc-2.17.so       00002B2F2975C555  __libc_start_main     Unknown  Unknown
memtest_urbanjost  0000000000403EA9  Unknown               Unknown  Unknown
Trying to allocate  960GB

This is weird: the loop that allocates 950GB array is successful, then the next causes segfault. I don’t think this caused by

i = i + 10
n = i * 1024_8**3 / 8

then something must happen between the loops.

I was suspicious that your process limits were less than your hardware limits, which would be easy for the Slurm scheduler to i nduce (“#SLURM --mem=0 --exclusive” is a nice way to avoid that, as it asks for all the memory a node has and makes sure no one else is on the machine) or that you were running multiple threads, each of which was doing an allocation, or that some of your nodes did not have 2T of memory, or that the backend nodes had an differen version of the compiler, … . but
the output shows none of those simple explanations appears to be the case.

Even though it did not solve an issues, just some additional minor tweeks to provide best practices since you are new to Fortran. In retrospect
I should have had you display the memory available in each iteration,
and add the status to the deallocate() as well. You almost always want
to add IMPLICIT NONE. So I included another version that is a little bit
more robust (but unlikely to help from what you reported.).

I would add a few additional compiler switches when debugging as well

ifort  -qopenmp -warn all -check all -O0 -g -traceback memtest_urbanjost.f90 -o memtest_urbanjost.exe

When debugging with ifort I would add at least “-O0 -g -traceback”
which will give you the line numbers of the source code lines.

But it looks like that test eliminated a few simple explanations but did not
fully explain what the failure is. If you have a recent GCC version on
the machine as well, it might be instructive to run “gfortran -fopenmp
…” as well. Perhaps using the Compiler Explorer site and other versions
of ifort and (better yet) ifx (which has superseded ifort) might show it
is a compiler or OpenMP bug in older versions of ifort. That is looking
more likely.

program memtest
use,intrinsic :: iso_fortran_env, only : int64, real64, compiler_version, compiler_options
use omp_lib, only : omp_get_thread_num
implicit none
integer(int64)            :: i, n
integer                   :: stat
real(real64), allocatable :: array(:)
character(len=255)        :: errmsg=''
   i=0
   print *, "Hello from process: ", omp_get_thread_num()
   print '(a)', 'This file was compiled by ', compiler_version(), &
                 'using the options ',        compiler_options()
   call execute_command_line('bash -c "free -h;ulimit -a"')
   do
      i = i + 10
      n = i * 1024_8**3 / 8
      write(*,"(A,I5,A)",advance='no') "Trying to allocate", i, "GB"
      if(allocated(array)) deallocate( array ,stat=stat,errmsg=errmsg)
      call checkstat()
      call execute_command_line('bash -c "free -h";echo')
      allocate( array(n), stat=stat,errmsg=errmsg)
      call checkstat()
   enddo
contains
subroutine checkstat()
   if(stat.eq.0)then
      write(*,"(A)") " --> OK"
   else
      write(*,"(A)") " --> FAILED:"//trim(errmsg)
      stop
   endif
end subroutine checkstat

end program

Thank you for the suggestions. The engineer at HPC solved it. It turns out ifort2018 can successfully allocate array to full memory:

 Hello from process:            0
This file was compiled by 
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 18.0.2.199 Build 20180210

using the options 
-qopenmp -warn all -check all -O0 -g -traceback -o memtest_urbanjost1.exe
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 8205780
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 8192
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Trying to allocate   10GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate   20GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate   30GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate 1960GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate 1970GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> FAILED:insufficient virtual memory

And gfortran too:

 Hello from process:            0
This file was compiled by 
GCC version 12.2.0
using the options 
-mtune=generic -march=x86-64 -g -O0 -Wall -fopenmp -fcheck=all -fbacktrace
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 8205780
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 8192
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Trying to allocate   10GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate   20GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate   30GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate 1960GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate 1970GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> FAILED:Attempt to allocate an allocated object

But not ifx:

 Hello from process:            0
This file was compiled by 
Intel(R) Fortran Compiler for applications running on IA-32, Version 2022.0.0 Build 20211123
using the options 
NYI
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 8205780
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 8192
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Trying to allocate   10GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate   20GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK

Trying to allocate  950GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

 --> OK
Trying to allocate  960GB --> OK
              total        used        free      shared  buff/cache   available
Mem:           2.0T         23G        1.9T        9.8G         10G        1.9T
Swap:            0B          0B          0B

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
memtest_urbanjost  000000000040C31A  Unknown               Unknown  Unknown
libpthread-2.17.s  00002B34BA86F630  Unknown               Unknown  Unknown
libiomp5.so        00002B34BA591FEE  Unknown               Unknown  Unknown
libiomp5.so        00002B34BA58F7C6  Unknown               Unknown  Unknown
libiomp5.so        00002B34BA58F2F8  Unknown               Unknown  Unknown
libiomp5.so        00002B34BA589993  Unknown               Unknown  Unknown
libiomp5.so        00002B34BA58A34D  Unknown               Unknown  Unknown
memtest_urbanjost  0000000000425439  Unknown               Unknown  Unknown
memtest_urbanjost  0000000000405597  memtest                    21  memtest_urbanjost1.f90
memtest_urbanjost  0000000000404CA2  Unknown               Unknown  Unknown
libc-2.17.so       00002B34BACA2555  __libc_start_main     Unknown  Unknown
memtest_urbanjost  0000000000404BA9  Unknown               Unknown  Unknown

By the way, what does ifx mean by applications running on IA-32? My build and run environment are 64bit.

In summary, it seems that when ifort was the only intel Fortran compiler, it behaves correctly.