GPU offloading in Fortran

rouson · January 25, 2025, 6:59pm

“GPU” appears nowhere in the Fortran standard and I’m pretty sure that’s intentional. The late Dan Nagle, who chaired the US arm of the Fortran committee, said to me, “The philosophy of Fortran is to give the programmer the ability to communicate properties of their code rather than to mandate what the compiler do to exploit those properties.” I wasn’t on the committee when do concurrent was developed, but my understanding is that the committee developed it with GPUs in mind, but I doubt the point was “baking GPU stuff directly into Fortran.”

A do loop is inherently a sequential construct. It explicitly tells the compiler “do these iterations in this order.” That ordering is essential when it comes to doing things like time advancement, wherein the calculations must respect causality to be correct. But because parallel programming was necessary long before parallel programming languages went mainstream and developers understandably couldn’t wait, we developed a pattern of first telling the compiler explicitly to do something sequentially and then undoing that sequential ordering with directives. One of the worst outcomes of this pattern is that we sometimes end up with more directives than program statements – all in the name of undoing what we did! It seems to me much more clear to just tell the compiler what we mean: these iterations can be done in any order you choose. That’s the purpose of do concurrent and fortunately, there are at least four compilers that can now parallelize do concurrent on CPUs or GPUs: compilers form NVIDIA, Intel, HPE (Cray), and LLVM in that approximately chronological order in terms of how long the compiler has had this capability. For an example of do concurrent achieving essentially the same performance as OpenMP when compiling with LLVM FLang and running on a CPU, see the slides from my “Just Write Fortran” talk at the 2024 Parallel Applications Workshop – Alternatives to MPI+X. That work is based on AMD’s ROCm fork of LLVM Flang, where I believe there is also already a branch that offers experimental support for offloading do concurrent to a GPU.

I’m old enough to remember floating-point co-processors in the 1990s. These days, when I mention floating-point co-processors to anyone under 40, they usually haven’t even heard of them because those devices eventually got absorbed into the CPU. I suspect we’re already seeing the early stages of a similar trend with GPUs, which I suspect is why the committee never intended to explicitly address GPUs in the language. I often wonder whether young developers in future decades will even know the term GPU and will be discussing whether to bake some new form of accelerator into the language.

Topic		Replies	Views
Fortran Programmers : How do you want to offload to GPU accelerators in the next 5 years? Poll	21	5230	February 10, 2021
High Performance Fortran (HPF) history and lessons	32	3932	December 31, 2020
Parallelization on GPU with Intel compiler Intel	55	2800	September 20, 2024
Questions from a Fortran HPC Webinar Help	30	2724	July 15, 2021
Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK \| NVIDIA Announcements	0	838	November 18, 2020

GPU offloading in Fortran

Related topics