To me this looks like a memory/stack size problem. Already at N = 370, you get a warning about the potential problems with concurrent accesses (although it doesn’t break it as long as you execute serially):
$ gfortran -Wall test_do_concurrent_omp_bug.f90
test_do_concurrent_omp_bug.f90:17:25:
17 | real(rk) :: a(0:N, N)
| 1
Warning: Array 'a' at (1) is larger than limit set by '-fmax-stack-var-size=',
moved from stack to static storage. This makes the procedure unsafe when
called recursively, or concurrently from multiple threads. Consider increasing
the '-fmax-stack-var-size=' limit (or use '-frecursive', which implies unlimited
'-fmax-stack-var-size') - or change the code to use an ALLOCATABLE array.
If the variable is never accessed concurrently, this warning can be ignored,
and the variable could also be declared with the SAVE attribute. [-Wsurprising]
(I have folded the message over multiple lines).
As you can see, the compiler warns you about concurrent access and just like @PierU has suggested, tells you to use dynamically allocated memory instead. The issue is also discussed here: fortran - Why Segmentation fault is happening in this openmp code? - Stack Overflow
Anyways, with the -fopenmp
version, you can make the routine work by increasing the stack size. Here’s an example from my system:
$ ulimit -s
8192
$ gfortran -Wall -fcheck=all test_do_concurrent_omp_bug.f90 -fopenmp
$ ./a.out
Running...
Segmentation fault: 11
$ ulimit -s 16384
$ ./a.out
Running...
Done!
If you look at the output produce compiling with -fdump-tree-original
, you will notice one important change:
- serial, without
-fopenmp
__attribute__((fn spec (". w "))) void foo (struct array01_real(kind=8) & restrict u) { static logical(kind=4) is_recursive.4 = 0; static real(kind=8) a[640800]; // ... truncated
-
with
-fopenmp
__attribute__((fn spec (". w "))) void foo (struct array01_real(kind=8) & restrict u) { real(kind=8) a[640800]; integer(kind=4) i;
The difference is the static
attribute. By the looks of it, in the -fopenmp
version, the array a
is an automatic variable, since each thread might need it’s own copy (the procedure should be re-entrant).
For an explanation of the stack, I would suggest reading this post by @dwwork: Why stack is faster than heap and what exactly is stack? - #10 by dwwork