High Performance Fortran (HPF) history and lessons

@sblionel for me “fixing Fortran” doesn’t just mean the “Fortran standard”, but rather the whole ecosystem of tools, libraries and compilers. It might be that we do not need to modify the language itself too much, but we need to improve the tools and other things so that Fortran users are not forced to migrate away.


@certik then we need to identify what needs improving. It doesn’t help to just hand-wave about this. We need specifics.

I agree this is the way to go, as opposed to trying to define anything in the language or standard a priori. I also think this is quite specific to start.

Currently, some compilers support compiling Fortran code for some GPUs. We need an open source compiler that will compile for most GPUs.

I think there’s an implied question here: Why can’t we build on the effort of gfortran which is already a mature compiler and why do we need a new open source compiler?

As I understand it, one argument is for building with LLVM, which provides mature tooling for writing compilers.

Another is that, as many of us here agree, that Fortran needs an interpreter/REPL to survive and grow in the long run. I don’t know this myself, but I hear from others that this is considerably less difficult to do with LLVM vs. gcc.

What else? Why can’t we do this with gfortran?

Though I cannot explain it very well (because of my lack of CS knowledge), I strongly feel that the basic difference between Fortran and C++ (or other languages including Python) is that Fortran (or the committee) disregards the importance of irregular data structures, and only focuses on regularly structured data. Indeed, Fortran still has only an “array” as data structure, and nothing else. As a result, we (the user) need to express everything in terms of that “array”, which makes the coding very verbose and tiring. Those “arrays” even do not have the push() or append() method yet in a type-agnostic manner (here, arr = [arr, elem] is not performant, I think). For applications with regular grid structures, this may be very advantageous by making the programming easier, but for other applications, Fortran does not become an option because of the lack of various data structures (which is my current “impression” or assumption). <-- My opinion in this part is a bit different from Ondrej and other people, I guess… In other words, Fortran lacks “smart objects” in the following quote (but my interpretation of the quote may be wrong; sorry in that case…)

Rob Pike’s 5 Rules of Programming (The corresponding HackerNews here)

Rule 5 is often shortened to “write stupid code that uses smart objects”.

My another concern is that: do the WG5 members also have experts in fields that focus more on irregular data structures (including machine learning?)? Aren’t the members possibly dominated by CFD communities? I am afraid there might be some selection bias (for both committee and user sides) in designing/choosing/requesting language features.


@sblionel and any and all readers working on orgs such as WG5 or in the Community toward an official / unofficial charter to advance Fortran as a language, please pay attention to this note and read carefully.

First, Fortran language bearers need a vision to make Fortran the lingua franca of scientific and technical computing. Toward this, as mentioned by @certik and others here and elsewhere, not only processors but the entire tooling options and ecosystem are critically important. Toward this, everyone should ask themselves: how inconvenient or difficult is it to develop such tooling using the Fortran language itself? For example, can one productively bootstrap an entire Fortran compiler and continue to enhance and maintain it using a base language set e.g., Fortran 2018? If not (and the answer in year 2020 is indeed that it is not), that has gotta be a realization there are serious deficiencies and those gaps need to be addressed. The vision has to a language such that everyone, at least in principle if not always in practice, can do all the needed tooling using the language itself.
Another crucial aspect of the language vision is to consider library development. Libraries are of utmost importance to all domains, but especially so in scientific and technical computing. It gets to the core of how science, especially the third leg now which is computational science to complement theory and experiments, advances - brick-by-brick and upon the shoulders of others. How easy and convenient is to develop libraries in Fortran that can used effectively and productively by the broader scientific and technical community? In year 2020, the reality is it is extremely difficult, impossible even in some situations. This extends to both standard libraries as well as other community / domain-specific libraries. The issues here are another indicators of where the language needs to improve.

With the above vision in mind, here’re the specifics for the base language itself to improve library and toolset development:

  1. Unsigned integers or bitset type,
  2. Proper enumeration type along the lines of Swift (preferably), or at least C++ / D,
  3. Improved support for OO paradigm including easier static casting and to be able to SEAL classes (mark them as final)
  4. Improved Generics and template programming including utilities that enable robust new containers and algorithms. This includes facilities such as iterators and being able to overload “operators” such N:M for array sections.
  5. Anonymous functions (lambda operations) and closures,
  6. Exception handling

The above list is not too long, it can be achieved with some focus and attention, especially toward the vision.

If this is achieved, it will truly open up Fortran to a new world of libraries (a la Kokkos in C++ for GPUs) and it will enable Fortran to remain agnostic to hardware and other combined hardware-software trends.


I do not agree that one should be able to bootstrap a Fortran compiler in Fortran. Fortran never was intended to be the “one language for all things”. Fortran is good for some things (scientific programming) and less good for other things (low-level system programming). I am not in favor of diluting Fortran’s strengths in an attempt to draw in uses more appropriate to other languages.

I am in favor of making Fortran programmers more productive in their development of applications where Fortran’s strengths lie. I especially encourage ways to make Fortran libraries more useful, though there is a historical obstacle that such libraries must be built with a specific Fortran compiler in mind. (This is not the fault of the language.)



My point with bootstrapping a compiler and library development is they are very useful theoretical tests and checks on how convenient and functional the language is to support itself.

The above set of 6 specific items make it conceivable - at least at some practical level in year 2020 - to develop a compiler for Fortran in Fortran itself and more importantly, to have a broad set of libraries for Fortran written in Fortran itself. Sure, one can achieve certain limited success with libraries with current Fortran, however it’s nothing close to the expectations and demands of the modern computing space.

The unviability of even conceiving such things with Fortran in the modern environment shows the significant gaps in the language and its ecosystem. This sends strong, negative signals to influencers and powers-that-be across the board, globally.

Solutions such as Kokkos that the C++ based application developments can employ so readily will remain few and far in between for Fortran.

Continued existence of gaps and deficiencies in the base language itself will have tremendous adverse impact on Fortran.

This is only meant as an eye-opener, it’s NOT a clarion call for Fortran processors to be written in Fortran.

Please do not misunderstand me and don’t use what I’m trying to convey as theoretical checks to distract or deflect from the attention of the 6 specific items listed above.

Also, the 6 specific items shown above is a highly filtered list for scientific and technical computing development of applications of all scales using Fortran. The list is targeted toward better and faster library development using Fortran.

It’s my very cautious attempt to distill the needs down to half-a-dozen or so items whilst trying to ensure the attempt is NOT to turn Fortran into another PL/I or another C++, two examples of languages that are “for all things”.

It’s specifically for massive computing advancement now and in the very near future toward scientific and technical problems of critical importance to all humankind - whether it be climate change or public health (e.g., the role of modeling and simulation in current and future pandemics) and food and resource allocation and distribution globally - that there is the need to work with all forms of information and data.

To try to always organize and force-fit all such information and data into the Fortran type system of limited intrinsics and only-one-container system of “rectangular” arrays is way too limiting. The world is not going to stand for this. Which is why libraries that are up-to-date and well-featured such as Kokkos do not come up frequently enough (or at all) in Fortran.

And that impacts everything including high-performance computing - the topic of this thread - using Fortran.

Speaking of building a compiler in Fortran, there is a nice book “FORTRAN Tools for VAX/VMS and MS-DOS” where the authors develop a compiler for a subset of Fortran - in Fortran 90. It also contains chapters on hash tables and lexical analyzers. I believe with todays language features, the task would be somewhat easier. But I totally agree on gaps in the ecosystem. For example for string handling, in the book I just mentioned, the authors write:

“The main shortcoming of Fortran for string handling is the lack of a standard library of routines for often-needed functions. As Fortran programmers we are faced with a choice: we either invest the up-front effort required to create our own standard library or we live with the continuing effort of hacking together a solution each time we are presented with similar problems.”

True in 1988 and still feels more or less true today.


A couple of notes and comments on this thread:

  1. Early Fortran compilers from IBM were written in Fortran (there was no other language besides assembly at the time). Performance of the compiler binary was improved by compiling the sources with optimization turned on.

  2. An early extension of Fortran - LRLtran - was used to write OS code. The main extension was the addition of a pointer type, which is fundamental in C and really necessary for writing an OS that spends a lot of time manipulating memory addresses. The original machines where this was done were from Cray, and those pointers became known as “Cray Pointers” even though they were not a Cray invention. LRL (now LLNL) deserves the credit/blame for them. With the advent of C and its derivatives people no longer tried to shoehorn Fortran into being a systems language.

  1. DO CONCURRENT construct contains sufficient information to do shared-memory threading. Most compilers will do OpenMP style threading across processor cores with a DO CONCURRENT loop. I asked about GPU threading and got the reply that NO ONE had asked for it in an RFP. Technically it would be done. But there seems to be no actual demand.
  1. I agree with Steve’s comments about not wanting to morph Fortran into some other popular language. There have been many proposals in the past to incorporate most of Ada into Fortran, usually rejected. PL/1 was an pasting together of Fortran, COBOL, and Algol. It exceeded in incomprehensibility. Fortran has thrived for many reasons. One is the stability of developing a code over decades with the same language. Another is keeping up with hardware trends. The ides of distributed memory parallelism is here to stay and Fortran has addressed that. How shared memory parallelism evolves is yet to be settled. Probably something conceptually like OpenMP, now with GPU offload support. But the Fujitsu ARM chip with SVE, particularly the next generation one, is an interesting alternative to the GPU idea. And now AMD is buying the major FPGA maker. Something general like DO CONCURRENT that gives the compiler a lot of flexibility is the best option for now.

@sblionel you were asking what languages features were added to C++ to help with parallel programming. In C++17 they added parallel algorithms:


NVIDIA then provides a GPU implementation of those:


And here is an example of how it can work in in practice:


The general idea is that C++ now has parallel building blocks, that are standardized, so when people use them, their code can run pretty well on modern hardware in a multiplatform manner.

I am well aware that many Fortran compilers automatically parallelize many Fortran constructs also. But the end user experience is just not the same.

@certik, thanks for the pointers to the parallel STL procedures, but these aren’t language features to my mind. Still, the idea of parallelized building blocks is a useful one and I’d encourage those of you working on a Fortran STL to keep this in mind. One might take a look at Intel’s Threading Building Blocks (TBB) for Intel C++ - this was not entirely successful for Intel, but it was an interesting approach. An open-source version is available.

@pmk, there have been no issues with DO CONCURRENT acknowledged. I am aware that you have a dissenting opinion on this topic. The compiler developers who already successfully parallelize DO CONCURRENT don’t seem to agree with you that there is a problem.

Necessary fixes and features to keep standard Fortran relevant in HPC have been specified by the community and then been dismissed or ignored.

Evidence and specifics, please. I am also getting a bit weary of references to “the community”, which often seems to be a reference to the same handful of people or a particular web site.

[Edit: I meant Intel TBB, not IPP.]

@sblionel you are correct that it is in the C++ standard library, so that would correspond to the Fortran’s stdlib efforts. In C++ the standard library is part of the standard itself, which is something we could also consider in a few years.

I presume the issue re: DO CONCURRENT and locality mentioned by @pmk is the one discussed here: https://mailman.j3-fortran.org/pipermail/j3/2020-July/012244.html

I admit I too was totally shocked by the response(s) on the J3 mailing list, for there is nothing in the current Fortran standard (Fortran 2018) that permits, “Someone who prefers the OpenMP “pedal-to-the-metal and no brakes” approach can just add DEFAULT(SHARED) to the DO CONCURRENT statement”. And there was no one who should be in the know (when it comes to HPC and parallelism and DO CONCURRENT, I don’t know enough) who questioned it or challenged it or followed up on it.

It’s only Fortran language and its practitioners who suffer due to this.

@pmk and anyone interested in this,

By examples, do you mean the one in the paper - https://j3-fortran.org/doc/year/19/19-134.txt - which appears to be:

    T(K(J)) = A(J)
    B(J) = T(L(J))

and a similar one posted at this comp,lang.fortran thread?

  subroutine foo(a,b,c,ix,iy,n)
    integer, intent(in) :: n, ix(*), iy(*)
    real, intent(inout) :: a(*), b(*)
    real, intent(in) :: c(*)
    do concurrent (j=1:n)
      b(ix(j)) = c(j)
      a(j) = b(iy(j))
    end do
  end subroutine

  program main
    real :: a(2), b(1) = [1.0], c(2) = [2.0, 3.0]
    integer :: ix(2) = [1, 1], iy(2) = [1, 1]
    call foo(a, b, c, ix, iy, 2)
    print *, sum(a)
  end program

Just adding my thoughts after reading this excellent discussion.

  1. Highly agree with this. And just as a side note, quite honestly I find the idea of discrete GPUs with separate memory and separate instruction set very unappealing. I’m rooting for Fujitsu’s ARM chips or heterogenous multicores like Apple’s M1. However, I would say that it’s important that Fortran have constructs that cleanly abstract the parallelism inherent in present and foreseeable hardware, so that compiler developers can have a reasonably easy time writing the optimizer. This is not as hard as it sounds and Fortran does this to a great degree already. Array operations already abstract CPU SIMD architectures, corrays already abstract distributed architectures, and DO CONCURRENT has great potential in abstracting the parallelism inherent in more complex SIMD architectures such as GPUs.

  2. I definitely understand @certik 's point that C++ is a language that allows building things like Kokkos and for practitioners that can make a big difference. But I can’t agree with the suggestion that it be practical to write a useable Fortran compiler using Fortran alone. As a C++ user I often see that many of C++'s weaknesses with respect to high performance scientific computing (lack of restriction, lack universal domain specific features, need to fall back on SIMD intrinsics or write explicit CUDA, horrific metaprogramming, forced necessity of compile time computing) come from the fact that it is so general-purpose. And many of the features that allow C++ to build a library like Kokkos come from the fact that a lot of APIs (such as OpenCL or CUDA) were designed explicitly to interface with C or C++. It’s not strictly a language advantage itself, but what vendors have chosen to interface. If C++ has any advantage at all it’s in the generics, but that’s of course another discussion.

  3. Do concurrent should, in my opinion, be a forced parallelization construct that displaces most of the functionality of OpenACC.


@edsterjo thank you for your comments. I think I agree with your points, they are generally what most people agree upon in the above discussion. Just to clarify your second point, I advocate that Fortran should be the best at high performance numerical computing, not for writing compilers.


Sorry, I should have linked to the comment that made the point about the conceivability of compiler bootstrap


I’ve tried to make clear upfront, in that comment and the subsequent one (here) , the point about bootstrapping and/or trying to author much-needed tooling for developer ecosystem in Fortran-only is only as a “theoretical” test, for any such attempt will also illustrate the monumental challenges with developing modern libraries in Fortran, be it any aspect of high-performance computing.