Please, No More Loops (Than Necessary): New Patterns in Fortran 2023
1:00 pm - 2:00 pm EDT
Wednesday, January 21, 2026
Presenter: Damian Rouson (Berkeley Lab)
Description:
Loops are seemingly ubiquitous in programming and yet writing loops provides one example of a common practice stuck in a pattern as old as high-level programming languages themselves. This webinar will provide an overview of the features introduced in Fortran standards from Fortran 90 to 2023. We will venture into often-unvisited nooks and crannies and traverse equally unvisited expansive pastures. Weaving feature groups together by the approaches they enable, the talk will emphasize array, object-oriented, parallel, modular, and functional programming patterns and paradigms. The talk will demonstrate the utility of the described features in open-source packages developed by Berkeley Lab’s Computer Languages and System Software (CLaSS) Group and our collaborators. The presentation will emphasize expressiveness and conciseness, showing how our Julienne correctness-checking framework supports writing assertions and unit tests using natural-language idioms; how we write textbook-form partial differential equations (PDE) in the Matcha T-cell motility simulator; and how we concisely capture advanced algorithms for training neural networks in the Fiats deep learning library. The talk will include a brief update on the status of the compiler and runtime-library support for these features in the open-source LLVM flang compiler and the Caffeine parallel runtime library developed by CLaSS and our collaborators. The talk will conclude with a description of the planned Fortran 2028 support for generic programming via type-safe templates and the powerful ramifications of this technology in our development a formally verifiable, domain-specific language embedded in Fortran 2028 via a type system being developed for the MOLE PDE solver library. One recurring theme will be the ability to write thousands of lines of code manipulating large collections of data with few or no loops.
I look forward to the talk, but one of Fortran’s strengths is that you can use array operations to concisely express some algorithms (as with NumPy) and loops when needed (as with C or C++). By contrast, Python loops are slow, and C++ does not have array operations with the same flexibility of Fortran, in which code with array operations and loops is comparably fast. But maybe I will learn about algorithms that I thought required loops but do not. Fortran does have do concurrent.
I think the trick Damian uses it declaring most things as elemental so that you can do an “implied” loop from the elemental procedures, which I think are mapped to a loop ish depending on the compiler (I believe). And do concurrent around an elemental function/subroutine can be offloaded to a GPU
Interesting. But do concurrent does not require elemental subprograms, they can be pure. Moreover, my understanding is that do concurrent does not eliminate any loop, it just tells the compiler that the loops are independent and therefore can be parallelized (similar to the now obsolete forall)
While I like using elemental procedures (the only case where I preferer a function rather than a subroutine), the issue is that as of now one can not rely on that simple fact to create proper and performant libraries because: if the procedure is placed under a module in a file my_mod.f90 and use it somewhere else (some_other_file.f90 > use my_mod, only: my_nice_elemental_proc), this procedure won’t be inlined (unless using-ipo/-flto, which come with their baggage of build time complexity).
I like using FORALL because it often requires fewer lines than the alternatives, so that a block or subprogram may be all visible in one window on my screen. Of course it may be slower than the alternatives but it’s a pity that it was made obsolescent.
My understanding of FORALL is that it is not and was never intended to be a “loop construct” even though the syntax might imply it. I believe it was officially a “indexed parallel array assignment” (the definition used in the “Fortran 95 Handbook”). Granted the differences are probably just a matter of semantics but the fact that compiler developers struggled to get FORALL to deliver the performance of loops sort of backs up the notion that they are not “loops”. I think one of the reasons for DO CONCURRENT is to overcome the confusion about just what FORALL actually does.
One of the talk’s main goals was to break away from the narrative that Fortran is old, behind the times, and a legacy language. Of course, Fortran is old, but thanks to evolving standards, it’s also new. And every language is ahead of or behind the others depending on what features one examines.
I hoped to show that every time developers write a loop in any language, they are using a feature that is 70 years old because it was in Fortran in 1956. Every time someone writes a loop in a language that doesn’t have something comparable to
array statements (Fortran 90),
elemental procedures or where constructs (Fortran 95), or
do concurrent (Fortran 2008),
their code is 36, 31, or 18 years behind Fortran, depending on which alternative one might choose in Fortran.
A second goal (possibly the more achievable and impactful goal) was to highlight less-known but very useful features as far back as Fortran 90.
Now thanks to an invitation from @milancurcic, I have to pivot and give a talk on deep learning for atmospheric sciences in less than 2 weeks (yikes! As they say, I guess I’ll sleep when I’m dead ). One aim of that talk will be to show that the grand old language still has legs and can do all the heavy lifting required for this new era of AI. Berkeley Lab’s Fiats deep learning library (which exists because of @milancurcic’s pioneering work with neural-fortran and because of his early guidance of my work) is all Fortran all the way down.
I am a bit jealous of you. I have used many of the features you showed myself (and I am particularly proud of some of the experiments I put in my 2012 book ) but in my daily work I have little opportunity for this, especially on a scale that you have shown.
Well, I might take up that offer . One thing I am currently puzzled about is how to use OpenMP offloading successfully. My toy program works when the grid is small (128x128), but fails miserably when the grid gets larger. It fails without any indication of what might actually be wrong. I have tried to find some resources that tell me how to use the OpenMP directives, but so far the information has been high-level and examples (C or Fortran) are even more toys than my program. I will post it with a more coherent description later, but on the Intel forum I have not seen much reactions.
@Arjen my first recommendation is to try do concurrent and let the compiler generate the OpenMP for you, which also means it might instead generate OpenACC or another alternative someday without you having to learn any of these alternatives. Current options for offloading do concurrent:
NVIDIA nvfortran
Intel ifx
HPE Cray: ftn
Coming soon: LLVM flang
If any of these are options, I can connect you with compiler developers who might be able to help. My closest contacts on this topic are with AMD compiler engineers working on flang. I think some of the early commits have been pushed to the main branch of lllvm-project and will appear in the imminent LLVM 22 release (a release candidate was minted recently). If you’re willing to build flang from source (I can help with that), then there’s might be a public branch that’s far enough to be useful to you — probably on AMD’s fork.