OpenMP and `do concurrent` loop = crash at runtime

Over the last couple of days, I’ve been playing with OpenMP (compiler = gfortran 12.2.0). While doing so on a rather simple piece of code, I realized that once a given array exceed a certain size, the program would fail at runtime.

Initially, I though the issue was caused by the $omp directives, but I finally realized that even if I removed all $omp directives, the problem persisted. Ultimately, I was able to pin down the problem to an unexpected (for me) clash between OpenMP and do concurrent loops.

In my computer (windows with gcc), the simple program below runs well for “any” array size if compiled without the flag -fopenmp. In contrast, with -fopenmp it fails for N>=360 raising the following cryptic message:

<ERROR> Execution failed for object " filename.exe "
<ERROR>*cmd_run*:stopping due to failed executions
STOP 1 
program test_do_concurrent_omp_bug
 use, intrinsic :: iso_fortran_env, only: real64
 implicit none

 integer, parameter :: rk = real64
 integer, parameter :: N = 370 ! raise me until it breaks!
 real(rk) :: x(N**2)

 print *, "Running..."
 call foo(x)
 print *, "Done!"

contains

 subroutine foo(u)
    real(rk), intent(out) :: u(:)
    real(rk) :: a(0:N, N)
    integer :: i, j
    
    a = 0._rk

    do concurrent(i=1:N, j=1:N)
       u((j - 1)*N + i) =  a(i, j)
    end do

 end subroutine foo

end program test_do_concurrent_omp_bug

Has anyone observed something similar? Is this a bug or a feature? If this is a general problem, it would appear that do concurrent loops effectively preclude using OpenMP…

1 Like

I can compile and run this piece of code with N=700 and gfortran 12.2, with -O3 -fopenmp:

With N=800 or more it fails, but probably because godbolt is limiting the amount of memory used by the programs.

Note also that with the declaration real(rk) :: a(0:N, N), the allocation is possibly on the stack, which is limited. Try replacing it with an allocatable

real(rk), allocatable :: a(:,:)
allocate( a(0:N, N) )

And indeed, I wouldn’t use do concurrent together with OpenMP

This may have nothing to do with the actual code problem, but there seems to be a dimension mismatch in the toy program that was posted.

Is this a problem with OpenMP, or with do concurrent, or what?

I don’t know, and maybe there’s no problem at all. It’s just that I don’t know what the compilers are supposed to do when a do concurrent construct has a !$OMP DO directive.

but there seems to be a dimension mismatch in the toy program that was posted.

There is no dimension mismatch. The array a intentionally starts at index zero and that should not represent any issue, because the indexes never go out of bounds. The shape of the this test derives from a real program where this pattern arises. I basically stripped out as much code as I could until I was left with the bare minimum that still displays the issue.

Note that the code has no $omp directive, so for certain no $omp is being incorrectly used. Furthermore, to the best of my knowledge, there is no rule that forbids using do concurrent in code where $omp directives are used. I could imagine that nesting $omp and do concurrent could lead to trouble, but again that is not the case here.

According to Offloading - GCC Wiki (gnu.org), it seems gcc 12 does some sort of offloading. Could that be what is breaking the do concurrent?

Do concurrent tells the compiler that the iterations within the loop construct are independent. That means they can be done in any order, including simultaneously. It is the programmer’s responsibility to ensure that that is the case (along with some other constraints, such a no loop exits, no return statements, and so on). Something would be very wrong with OpenMP if the programmer were not allowed to use that construct.

OpenMP parallel do and do concurrent have similar goals but nothing ensures that they use the same underlying model. As a matter of fact, only the very latest (5.2) version of OpenMP handles the do concurrent loops, are still with restrictions:

This OpenMP API specification refers to ISO/IEC 1539-1:2018 as Fortran 2018. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
– …
– Locality of variables in a DO CONCURRENT construct

EDIT: oops, this is the 5.1 spec ! Looks like there’s no more restriction in 5.2: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf

I am a bit uncertain on how to proceed from here. do concurrent is a flag feature of modern fortran. OpenMP is also an important specification. It is somehow odd that they don’t get along in a robust manner…

What is the community advice on this?

  • Should I just forget about do concurrent, so that I am sure that my code can be used with OpenMP?
  • Should I report this bug? Where/how? I am not even sure this is a bug, because I can’t find any information on how gcc + do concurrent + -fopenmp is supposed to behave.

To me this looks like a memory/stack size problem. Already at N = 370, you get a warning about the potential problems with concurrent accesses (although it doesn’t break it as long as you execute serially):

$ gfortran -Wall test_do_concurrent_omp_bug.f90 
test_do_concurrent_omp_bug.f90:17:25:

   17 |     real(rk) :: a(0:N, N)
      |                         1
Warning: Array 'a' at (1) is larger than limit set by '-fmax-stack-var-size=', 
moved from stack to static storage. This makes the procedure unsafe when 
called recursively, or concurrently from multiple threads. Consider increasing 
the '-fmax-stack-var-size=' limit (or use '-frecursive', which implies unlimited 
'-fmax-stack-var-size') - or change the code to use an ALLOCATABLE array. 
If the variable is never accessed concurrently, this warning can be ignored, 
and the variable could also be declared with the SAVE attribute. [-Wsurprising]

(I have folded the message over multiple lines).

As you can see, the compiler warns you about concurrent access and just like @PierU has suggested, tells you to use dynamically allocated memory instead. The issue is also discussed here: fortran - Why Segmentation fault is happening in this openmp code? - Stack Overflow

Anyways, with the -fopenmp version, you can make the routine work by increasing the stack size. Here’s an example from my system:

$ ulimit -s
8192
$ gfortran -Wall -fcheck=all test_do_concurrent_omp_bug.f90 -fopenmp
$ ./a.out
 Running...
Segmentation fault: 11
$ ulimit -s 16384
$ ./a.out
 Running...
 Done!

If you look at the output produce compiling with -fdump-tree-original, you will notice one important change:

  1. serial, without -fopenmp
    __attribute__((fn spec (". w ")))
    void foo (struct array01_real(kind=8) & restrict u)
    {
      static logical(kind=4) is_recursive.4 = 0;
      static real(kind=8) a[640800];
      // ... truncated
    
  1. with -fopenmp

    __attribute__((fn spec (". w ")))
    void foo (struct array01_real(kind=8) & restrict u)
    {
      real(kind=8) a[640800];
      integer(kind=4) i;
    

The difference is the static attribute. By the looks of it, in the -fopenmp version, the array a is an automatic variable, since each thread might need it’s own copy (the procedure should be re-entrant).

For an explanation of the stack, I would suggest reading this post by @dwwork: Why stack is faster than heap and what exactly is stack? - #10 by dwwork

What do you mean by “handles”?

The 5.2 spec defines a Fortran canonical loop nest form which must match the grammar:

DO [ label ] var = lb , ub [ , incr ]
[intervening-code]
loop-body
[intervening-code]
[ label ] END DO

Such a loop-nest can then appear in a work-sharing loop construct:

!$omp do [clause[ [,] clause] ... ] 
   loop-nest 
[!$omp end do [nowait]]

It doesn’t appear possible to combine the two:

!$omp do               ! not allowed
do concurrent(i=1:n) 
   ! loop body
end do

The two concepts can be used independently, according to the following rules/restrictions

  • An OpenMP directive may not appear in a do concurrent construct
  • In an OpenMP data environment, do concurrent loop indices are private
  • The effect of an access to a threadprivate variable in a DO CONCURRENT construct is unspecified.
  • OpenMP runtime library routines may not be called in DO CONCURRENT constructs.

I mean that an !$OMP DO directive can be set upfront a do concurrent construct. You say that it’s not possible even in OpenMP 5.2, but I’m not sure I agree. While the OpenMP specification was clearly stating the “[the use to DO CONCURRENT] may result in unspecified behavior” until the 5.0 version, this warning has completely disappeared in the 5.2 version.

However, assuming it’s possible, it’s really not clear how they interact.

Anyway, I tend to not use do concurrent at all, as I find OpenMP extremely simple, with generally efficient implementations, and much more control on the locality of variables, on the scheduling, etc…

Thanks, the explanation and the links were very useful.

I feel somewhat embarrassed for not having seen the warning. I got into the habit of compiling the code with fpm build, which by default has the option --profile debug, which in turn includes the flag -Wall. So, I assumed I would see warnings if any existed… But actually, the warning message is only displayed with the additional option --verbose.

If it were possible, I would share the solution between you and @PierU. That not being the case, I’ll assign it to you, since you were the one who ultimately showed me how to diagnose the issue.