DO CONCURRENT: compiler flags to enable parallelization

If I understand correctly, a do concurrent construct does not necessarily imply that the code inside the block will run in parallel, because (for instance) the compiler might estimate that the compute task does not justify the overhead of parallelization.
On the other hand, I have doubts about what must be done to allow the compiler to consider a possible parallelization. More specifically, my questions are:

  1. Is it correct that parallelization of do and do concurrent loops is deactivated by default unless a specific compiler flag is used?
  2. With ifort, according to this page, it seems that parallelization of do concurrent requires compilation with --parallel or -qopenmp. In this manner, if the compute work justifies it, it will be (automatically) distributed among the number of available threads at runtime. Is this correct?
  3. With gfortran, according to this paper, parallelization of do concurrent requires compilation with -ftree-parallelize-loops=N, meaning that N at runtime is fixed by the value chosen at compile time. Is this correct?

What is your opinion and experience regarding this matter?

3 Likes

nvfortran also supports parallelization on CPUs / GPU offloading GPU offloading using DO CONCURRENT (they even implemented reduction clauses from the upcoming 2023 standard): Using Fortran Standard Parallel Programming for GPU Acceleration | NVIDIA Technical Blog.

1 Like

Thanks for the hint. Yes, the paper that I cited in my first post has a detailed comparison of ifort, nvfortran and gfortran, and NVIDIA’s compiler does indeed do a good job at parallelizing do concurrent constructs. By default, I use gfortran, so (more or less implicitly) I am looking for the appropriate flags for this compiler.

1 Like

Thanks, very helpfull suggestion. I have just started playing with the -fopt-info flag. :slight_smile:

1 Like