DO CONCURRENT: compiler flags to enable parallelization

If I understand correctly, a do concurrent construct does not necessarily imply that the code inside the block will run in parallel, because (for instance) the compiler might estimate that the compute task does not justify the overhead of parallelization.
On the other hand, I have doubts about what must be done to allow the compiler to consider a possible parallelization. More specifically, my questions are:

  1. Is it correct that parallelization of do and do concurrent loops is deactivated by default unless a specific compiler flag is used?
  2. With ifort, according to this page, it seems that parallelization of do concurrent requires compilation with --parallel or -qopenmp. In this manner, if the compute work justifies it, it will be (automatically) distributed among the number of available threads at runtime. Is this correct?
  3. With gfortran, according to this paper, parallelization of do concurrent requires compilation with -ftree-parallelize-loops=N, meaning that N at runtime is fixed by the value chosen at compile time. Is this correct?

What is your opinion and experience regarding this matter?


nvfortran also supports parallelization on CPUs / GPU offloading GPU offloading using DO CONCURRENT (they even implemented reduction clauses from the upcoming 2023 standard): Using Fortran Standard Parallel Programming for GPU Acceleration | NVIDIA Technical Blog.

1 Like

Thanks for the hint. Yes, the paper that I cited in my first post has a detailed comparison of ifort, nvfortran and gfortran, and NVIDIA’s compiler does indeed do a good job at parallelizing do concurrent constructs. By default, I use gfortran, so (more or less implicitly) I am looking for the appropriate flags for this compiler.

1 Like

Thanks, very helpfull suggestion. I have just started playing with the -fopt-info flag. :slight_smile:

1 Like

I summarized some do concurrent related information in this thread and thought it is worth reposting here:

Multi-threaded do concurrent (CPU)

Compiler Parallel flag Information Number of threads Underlying implementation
gfortran -ftree-parallelize-loops=n -fopt-info-loop using the parallel flag OpenMP/pthreads
nvfortran -stdpar=multicore -Minfo=stdpar,accel ACC_NUM_CORES OpenACC
ifort (deprecated) -parallel -qopt-report -qopt-report-phase=par OMP_NUM_THREADS, -par-num-threads=n OpenMP
ifx -qopenmp -qopt-report OMP_NUM_THREADS OpenMP
CCE ftn (Cray/HPE) -h thread_do_concurrent ? ? ?
AMD flang -fopenmp ? OMP_NUM_THREADS OpenMP

The OpenMP environment variables can also be used to control processor affinity. This is also the case for nvfortran, which responds to OMP_PROC_BIND and OMP_PLACES, because OpenACC doesn’t have variables for thread-to-core binding.


PS: I’ve made this a Wiki post, so feel free to add missing information.


1. The do concurrent has nothing to do with parallel flags for intel.

It uses OpenMP under the hood and the compiler flag is always


for windows for both ifort and ifx.

2. I think it is same for gfortran and -fopenmp compiler flag is used for do concurrent.

3.A few months ago i asked the same question in the intel community. The moderator reply was

Why are you using /Qparallel? That turns on the auto-parallelizer. I'm not sure what that does if anything with DO CONCURRENT.

As I just posted on another thread the DO CONCURRENT / openmp combination uses OMP SIMD. 

Concerning the Intel Fortran Compiler Classic (ifort), this Intel thread from 2015 stated:

DO CONCURRENT allows the compiler to ignore any potential dependencies between iterations and to execute the loop in parallel. This can mean either SIMD parallelism (vectorization), which is enabled by default, or thread parallelism (auto-parallelization), which is enabled only by /Qparallel. This is independent of /Qopenmp, which does not enable auto-parallelization, it only enables parallelism through OpenMP directives. However, auto-parallelization with /Qparallel uses the same underlying OpenMP runtime library as /Qopenmp. The overhead for setting up and entering a parallel region is typically thousands of clock cycles, so auto-parallelization is usually worthwhile only for loops with a sufficiently large amount of work to amortize this overhead.

And in this Intel thread, @sblionel stated:

DO CONCURRENT does not “demand parallel” - it allows/requests it. As others have said, the semantics of DO CONCURRENT make it more likely that the loop can be parallelized correctly. If you’re not enabling auto-parallel, there is no benefit to DO CONCURRENT.

With the new Intel LLVM compiler (ifx), this has changed, again quoting @sblionel:

Just as a followup to my March 2022 reply, Intel’s LLVM-based ifx compiler does not support -parallel at all. It will (attempt to) parallelize DO CONCURRENT if you enable OpenMP, even if you don’t use OpenMP otherwise.

I have verified that the -fopenmp flag is not needed and inspected the compiler reports to verify parallelization occurs. The executable produced on Linux has a dependency on OpenMP (GOMP) and pthreads, as stated by GCC documentation for -ftree-parallelize-loops:

This option implies -pthread, and thus is only supported on targets that have support for -pthread.

I’m guessing they were referring to the new Intel LLVM compiler, as ifort was “end-of-life” already.

What is worth noting is that in both ifort and gfortran, the respective parallel flags also work on regular do loops, if the compiler heuristic determines this would be profitable. Using do concurrent instead of do is about intent, and letting the compiler know the loop can be executed concurrently, meaning there are no data dependencies, and it can be safely parallelized.

The flang documentation captured this well when it says,

The best option seems to be the one that assumes that users who write DO CONCURRENT constructs are doing so with the intent to write parallel code.

on the topic of “how to convey to a compiler that a loop is safely parallelizable”


This is very interesting.

I will check my codes again and possibly get back.

1 Like