DO CONCURRENT: compiler flags to enable parallelization

HugoMVale · September 12, 2022, 9:09pm

If I understand correctly, a do concurrent construct does not necessarily imply that the code inside the block will run in parallel, because (for instance) the compiler might estimate that the compute task does not justify the overhead of parallelization.
On the other hand, I have doubts about what must be done to allow the compiler to consider a possible parallelization. More specifically, my questions are:

Is it correct that parallelization of do and do concurrent loops is deactivated by default unless a specific compiler flag is used?
With ifort, according to this page, it seems that parallelization of do concurrent requires compilation with --parallel or -qopenmp. In this manner, if the compute work justifies it, it will be (automatically) distributed among the number of available threads at runtime. Is this correct?
With gfortran, according to this paper, parallelization of do concurrent requires compilation with -ftree-parallelize-loops=N, meaning that N at runtime is fixed by the value chosen at compile time. Is this correct?

What is your opinion and experience regarding this matter?

pcosta · September 12, 2022, 9:20pm

nvfortran also supports parallelization on CPUs / GPU offloading GPU offloading using DO CONCURRENT (they even implemented reduction clauses from the upcoming 2023 standard): Using Fortran Standard Parallel Programming for GPU Acceleration | NVIDIA Technical Blog.

HugoMVale · September 13, 2022, 7:40pm

Thanks for the hint. Yes, the paper that I cited in my first post has a detailed comparison of ifort, nvfortran and gfortran, and NVIDIA’s compiler does indeed do a good job at parallelizing do concurrent constructs. By default, I use gfortran, so (more or less implicitly) I am looking for the appropriate flags for this compiler.

HugoMVale · September 13, 2022, 8:02pm

Thanks, very helpfull suggestion. I have just started playing with the -fopt-info flag.

ivanpribec · January 3, 2024, 12:53am

I summarized some do concurrent related information in this thread and thought it is worth reposting here:

Multi-threaded do concurrent (CPU)

Compiler	Parallel flag	Information	Number of threads	Underlying implementation
`gfortran`	`-ftree-parallelize-loops=n`	`-fopt-info-loop`	using the parallel flag	OpenMP/pthreads
`nvfortran`	`-stdpar=multicore`	`-Minfo=stdpar,accel`	`ACC_NUM_CORES`	OpenACC
`ifort` (deprecated)	`-parallel`	`-qopt-report -qopt-report-phase=par`	`OMP_NUM_THREADS`, `-par-num-threads=n`	OpenMP
`ifx`	`-qopenmp`	`-qopt-report`	`OMP_NUM_THREADS`	OpenMP
CCE `ftn` (Cray/HPE)	`-h thread_do_concurrent`	?	?	?
AMD `flang`	`-fopenmp`	?	`OMP_NUM_THREADS`	OpenMP

The OpenMP environment variables can also be used to control processor affinity. This is also the case for nvfortran, which responds to OMP_PROC_BIND and OMP_PLACES, because OpenACC doesn’t have variables for thread-to-core binding.

Resources

PS: I’ve made this a Wiki post, so feel free to add missing information.

Shahid · January 3, 2024, 6:53am

1. The do concurrent has nothing to do with parallel flags for intel.

It uses OpenMP under the hood and the compiler flag is always

/Qopenmp

for windows for both ifort and ifx.

2. I think it is same for gfortran and -fopenmp compiler flag is used for do concurrent.

3.A few months ago i asked the same question in the intel community. The moderator reply was

Why are you using /Qparallel? That turns on the auto-parallelizer. I'm not sure what that does if anything with DO CONCURRENT.

As I just posted on another thread the DO CONCURRENT / openmp combination uses OMP SIMD.

ivanpribec · January 3, 2024, 8:30am

Concerning the Intel Fortran Compiler Classic (ifort), this Intel thread from 2015 stated:

DO CONCURRENT allows the compiler to ignore any potential dependencies between iterations and to execute the loop in parallel. This can mean either SIMD parallelism (vectorization), which is enabled by default, or thread parallelism (auto-parallelization), which is enabled only by /Qparallel. This is independent of /Qopenmp, which does not enable auto-parallelization, it only enables parallelism through OpenMP directives. However, auto-parallelization with /Qparallel uses the same underlying OpenMP runtime library as /Qopenmp. The overhead for setting up and entering a parallel region is typically thousands of clock cycles, so auto-parallelization is usually worthwhile only for loops with a sufficiently large amount of work to amortize this overhead.

And in this Intel thread, @sblionel stated:

DO CONCURRENT does not “demand parallel” - it allows/requests it. As others have said, the semantics of DO CONCURRENT make it more likely that the loop can be parallelized correctly. If you’re not enabling auto-parallel, there is no benefit to DO CONCURRENT.

With the new Intel LLVM compiler (ifx), this has changed, again quoting @sblionel:

Just as a followup to my March 2022 reply, Intel’s LLVM-based ifx compiler does not support -parallel at all. It will (attempt to) parallelize DO CONCURRENT if you enable OpenMP, even if you don’t use OpenMP otherwise.

I have verified that the -fopenmp flag is not needed and inspected the compiler reports to verify parallelization occurs. The executable produced on Linux has a dependency on OpenMP (GOMP) and pthreads, as stated by GCC documentation for -ftree-parallelize-loops:

This option implies -pthread, and thus is only supported on targets that have support for -pthread.

I’m guessing they were referring to the new Intel LLVM compiler, as ifort was “end-of-life” already.

What is worth noting is that in both ifort and gfortran, the respective parallel flags also work on regular do loops, if the compiler heuristic determines this would be profitable. Using do concurrent instead of do is about intent, and letting the compiler know the loop can be executed concurrently, meaning there are no data dependencies, and it can be safely parallelized.

The flang documentation captured this well when it says,

The best option seems to be the one that assumes that users who write DO CONCURRENT constructs are doing so with the intent to write parallel code.

on the topic of “how to convey to a compiler that a loop is safely parallelizable”

Shahid · January 3, 2024, 2:46pm

This is very interesting.

I will check my codes again and possibly get back.

Topic		Replies	Views
Gfortran with do concurrent for windows 10 Help	8	1008	August 27, 2023
Can Fortran's 'do concurrent' replace directives for accelerated computing? (paper)	3	641	November 16, 2021
Nvfortran comparison of do concurrent vs OpenMP code Help	24	788	September 9, 2024
GSoC'22: Accelerating Fortran DO CONCURRENT in GCC GSoC-2022	9	1424	June 12, 2022
HPC Wire discusses Fortran "Fortran, Why Yes Fortran" Announcements	1	286	August 23, 2024

DO CONCURRENT: compiler flags to enable parallelization

Multi-threaded do concurrent (CPU)

Resources

Related topics