OpenMP and `do concurrent` loop = crash at runtime

To me this looks like a memory/stack size problem. Already at N = 370, you get a warning about the potential problems with concurrent accesses (although it doesn’t break it as long as you execute serially):

$ gfortran -Wall test_do_concurrent_omp_bug.f90 
test_do_concurrent_omp_bug.f90:17:25:

   17 |     real(rk) :: a(0:N, N)
      |                         1
Warning: Array 'a' at (1) is larger than limit set by '-fmax-stack-var-size=', 
moved from stack to static storage. This makes the procedure unsafe when 
called recursively, or concurrently from multiple threads. Consider increasing 
the '-fmax-stack-var-size=' limit (or use '-frecursive', which implies unlimited 
'-fmax-stack-var-size') - or change the code to use an ALLOCATABLE array. 
If the variable is never accessed concurrently, this warning can be ignored, 
and the variable could also be declared with the SAVE attribute. [-Wsurprising]

(I have folded the message over multiple lines).

As you can see, the compiler warns you about concurrent access and just like @PierU has suggested, tells you to use dynamically allocated memory instead. The issue is also discussed here: fortran - Why Segmentation fault is happening in this openmp code? - Stack Overflow

Anyways, with the -fopenmp version, you can make the routine work by increasing the stack size. Here’s an example from my system:

$ ulimit -s
8192
$ gfortran -Wall -fcheck=all test_do_concurrent_omp_bug.f90 -fopenmp
$ ./a.out
 Running...
Segmentation fault: 11
$ ulimit -s 16384
$ ./a.out
 Running...
 Done!

If you look at the output produce compiling with -fdump-tree-original, you will notice one important change:

  1. serial, without -fopenmp
    __attribute__((fn spec (". w ")))
    void foo (struct array01_real(kind=8) & restrict u)
    {
      static logical(kind=4) is_recursive.4 = 0;
      static real(kind=8) a[640800];
      // ... truncated
    
  1. with -fopenmp

    __attribute__((fn spec (". w ")))
    void foo (struct array01_real(kind=8) & restrict u)
    {
      real(kind=8) a[640800];
      integer(kind=4) i;
    

The difference is the static attribute. By the looks of it, in the -fopenmp version, the array a is an automatic variable, since each thread might need it’s own copy (the procedure should be re-entrant).

For an explanation of the stack, I would suggest reading this post by @dwwork: Why stack is faster than heap and what exactly is stack? - #10 by dwwork