Over the last couple of days, I’ve been playing with OpenMP (compiler = gfortran 12.2.0). While doing so on a rather simple piece of code, I realized that once a given array exceed a certain size, the program would fail at runtime.
Initially, I though the issue was caused by the $omp directives, but I finally realized that even if I removed all $omp directives, the problem persisted. Ultimately, I was able to pin down the problem to an unexpected (for me) clash between OpenMP and do concurrent loops.
In my computer (windows with gcc), the simple program below runs well for “any” array size if compiled without the flag -fopenmp. In contrast, with-fopenmp it fails for N>=360 raising the following cryptic message:
<ERROR> Execution failed for object " filename.exe "
<ERROR>*cmd_run*:stopping due to failed executions
STOP 1
program test_do_concurrent_omp_bug
use, intrinsic :: iso_fortran_env, only: real64
implicit none
integer, parameter :: rk = real64
integer, parameter :: N = 370 ! raise me until it breaks!
real(rk) :: x(N**2)
print *, "Running..."
call foo(x)
print *, "Done!"
contains
subroutine foo(u)
real(rk), intent(out) :: u(:)
real(rk) :: a(0:N, N)
integer :: i, j
a = 0._rk
do concurrent(i=1:N, j=1:N)
u((j - 1)*N + i) = a(i, j)
end do
end subroutine foo
end program test_do_concurrent_omp_bug
Has anyone observed something similar? Is this a bug or a feature? If this is a general problem, it would appear that do concurrent loops effectively preclude using OpenMP…
I can compile and run this piece of code with N=700 and gfortran 12.2, with -O3 -fopenmp:
With N=800 or more it fails, but probably because godbolt is limiting the amount of memory used by the programs.
Note also that with the declaration real(rk) :: a(0:N, N), the allocation is possibly on the stack, which is limited. Try replacing it with an allocatable
I don’t know, and maybe there’s no problem at all. It’s just that I don’t know what the compilers are supposed to do when a do concurrent construct has a !$OMP DO directive.
but there seems to be a dimension mismatch in the toy program that was posted.
There is no dimension mismatch. The array a intentionally starts at index zero and that should not represent any issue, because the indexes never go out of bounds. The shape of the this test derives from a real program where this pattern arises. I basically stripped out as much code as I could until I was left with the bare minimum that still displays the issue.
Note that the code has no $omp directive, so for certain no $omp is being incorrectly used. Furthermore, to the best of my knowledge, there is no rule that forbids using do concurrent in code where $omp directives are used. I could imagine that nesting $omp and do concurrent could lead to trouble, but again that is not the case here.
According to Offloading - GCC Wiki (gnu.org), it seems gcc 12 does some sort of offloading. Could that be what is breaking the do concurrent?
Do concurrent tells the compiler that the iterations within the loop construct are independent. That means they can be done in any order, including simultaneously. It is the programmer’s responsibility to ensure that that is the case (along with some other constraints, such a no loop exits, no return statements, and so on). Something would be very wrong with OpenMP if the programmer were not allowed to use that construct.
OpenMP parallel do and do concurrent have similar goals but nothing ensures that they use the same underlying model. As a matter of fact, only the very latest (5.2) version of OpenMP handles the do concurrent loops, are still with restrictions:
This OpenMP API specification refers to ISO/IEC 1539-1:2018 as Fortran 2018. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.
– …
– Locality of variables in a DO CONCURRENT construct
I am a bit uncertain on how to proceed from here. do concurrent is a flag feature of modern fortran. OpenMP is also an important specification. It is somehow odd that they don’t get along in a robust manner…
What is the community advice on this?
Should I just forget about do concurrent, so that I am sure that my code can be used with OpenMP?
Should I report this bug? Where/how? I am not even sure this is a bug, because I can’t find any information on how gcc + do concurrent + -fopenmp is supposed to behave.
To me this looks like a memory/stack size problem. Already at N = 370, you get a warning about the potential problems with concurrent accesses (although it doesn’t break it as long as you execute serially):
$ gfortran -Wall test_do_concurrent_omp_bug.f90
test_do_concurrent_omp_bug.f90:17:25:
17 | real(rk) :: a(0:N, N)
| 1
Warning: Array 'a' at (1) is larger than limit set by '-fmax-stack-var-size=',
moved from stack to static storage. This makes the procedure unsafe when
called recursively, or concurrently from multiple threads. Consider increasing
the '-fmax-stack-var-size=' limit (or use '-frecursive', which implies unlimited
'-fmax-stack-var-size') - or change the code to use an ALLOCATABLE array.
If the variable is never accessed concurrently, this warning can be ignored,
and the variable could also be declared with the SAVE attribute. [-Wsurprising]
The difference is the static attribute. By the looks of it, in the -fopenmp version, the array a is an automatic variable, since each thread might need it’s own copy (the procedure should be re-entrant).
I mean that an !$OMP DO directive can be set upfront a do concurrent construct. You say that it’s not possible even in OpenMP 5.2, but I’m not sure I agree. While the OpenMP specification was clearly stating the “[the use to DO CONCURRENT] may result in unspecified behavior” until the 5.0 version, this warning has completely disappeared in the 5.2 version.
However, assuming it’s possible, it’s really not clear how they interact.
Anyway, I tend to not use do concurrent at all, as I find OpenMP extremely simple, with generally efficient implementations, and much more control on the locality of variables, on the scheduling, etc…
Thanks, the explanation and the links were very useful.
I feel somewhat embarrassed for not having seen the warning. I got into the habit of compiling the code with fpm build, which by default has the option --profile debug, which in turn includes the flag -Wall. So, I assumed I would see warnings if any existed… But actually, the warning message is only displayed with the additional option --verbose.
If it were possible, I would share the solution between you and @PierU. That not being the case, I’ll assign it to you, since you were the one who ultimately showed me how to diagnose the issue.