High Performance Fortran (HPF) history and lessons

@sblionel,

My point with bootstrapping a compiler and library development is they are very useful theoretical tests and checks on how convenient and functional the language is to support itself.

The above set of 6 specific items make it conceivable - at least at some practical level in year 2020 - to develop a compiler for Fortran in Fortran itself and more importantly, to have a broad set of libraries for Fortran written in Fortran itself. Sure, one can achieve certain limited success with libraries with current Fortran, however itā€™s nothing close to the expectations and demands of the modern computing space.

The unviability of even conceiving such things with Fortran in the modern environment shows the significant gaps in the language and its ecosystem. This sends strong, negative signals to influencers and powers-that-be across the board, globally.

Solutions such as Kokkos that the C++ based application developments can employ so readily will remain few and far in between for Fortran.

Continued existence of gaps and deficiencies in the base language itself will have tremendous adverse impact on Fortran.

This is only meant as an eye-opener, itā€™s NOT a clarion call for Fortran processors to be written in Fortran.

Please do not misunderstand me and donā€™t use what Iā€™m trying to convey as theoretical checks to distract or deflect from the attention of the 6 specific items listed above.

Also, the 6 specific items shown above is a highly filtered list for scientific and technical computing development of applications of all scales using Fortran. The list is targeted toward better and faster library development using Fortran.

Itā€™s my very cautious attempt to distill the needs down to half-a-dozen or so items whilst trying to ensure the attempt is NOT to turn Fortran into another PL/I or another C++, two examples of languages that are ā€œfor all thingsā€.

Itā€™s specifically for massive computing advancement now and in the very near future toward scientific and technical problems of critical importance to all humankind - whether it be climate change or public health (e.g., the role of modeling and simulation in current and future pandemics) and food and resource allocation and distribution globally - that there is the need to work with all forms of information and data.

To try to always organize and force-fit all such information and data into the Fortran type system of limited intrinsics and only-one-container system of ā€œrectangularā€ arrays is way too limiting. The world is not going to stand for this. Which is why libraries that are up-to-date and well-featured such as Kokkos do not come up frequently enough (or at all) in Fortran.

And that impacts everything including high-performance computing - the topic of this thread - using Fortran.

Speaking of building a compiler in Fortran, there is a nice book ā€œFORTRAN Tools for VAX/VMS and MS-DOSā€ where the authors develop a compiler for a subset of Fortran - in Fortran 90. It also contains chapters on hash tables and lexical analyzers. I believe with todays language features, the task would be somewhat easier. But I totally agree on gaps in the ecosystem. For example for string handling, in the book I just mentioned, the authors write:

ā€œThe main shortcoming of Fortran for string handling is the lack of a standard library of routines for often-needed functions. As Fortran programmers we are faced with a choice: we either invest the up-front effort required to create our own standard library or we live with the continuing effort of hacking together a solution each time we are presented with similar problems.ā€

True in 1988 and still feels more or less true today.

3 Likes

A couple of notes and comments on this thread:

  1. Early Fortran compilers from IBM were written in Fortran (there was no other language besides assembly at the time). Performance of the compiler binary was improved by compiling the sources with optimization turned on.

  2. An early extension of Fortran - LRLtran - was used to write OS code. The main extension was the addition of a pointer type, which is fundamental in C and really necessary for writing an OS that spends a lot of time manipulating memory addresses. The original machines where this was done were from Cray, and those pointers became known as ā€œCray Pointersā€ even though they were not a Cray invention. LRL (now LLNL) deserves the credit/blame for them. With the advent of C and its derivatives people no longer tried to shoehorn Fortran into being a systems language.

  1. DO CONCURRENT construct contains sufficient information to do shared-memory threading. Most compilers will do OpenMP style threading across processor cores with a DO CONCURRENT loop. I asked about GPU threading and got the reply that NO ONE had asked for it in an RFP. Technically it would be done. But there seems to be no actual demand.
  1. I agree with Steveā€™s comments about not wanting to morph Fortran into some other popular language. There have been many proposals in the past to incorporate most of Ada into Fortran, usually rejected. PL/1 was an pasting together of Fortran, COBOL, and Algol. It exceeded in incomprehensibility. Fortran has thrived for many reasons. One is the stability of developing a code over decades with the same language. Another is keeping up with hardware trends. The ides of distributed memory parallelism is here to stay and Fortran has addressed that. How shared memory parallelism evolves is yet to be settled. Probably something conceptually like OpenMP, now with GPU offload support. But the Fujitsu ARM chip with SVE, particularly the next generation one, is an interesting alternative to the GPU idea. And now AMD is buying the major FPGA maker. Something general like DO CONCURRENT that gives the compiler a lot of flexibility is the best option for now.
3 Likes

@sblionel you were asking what languages features were added to C++ to help with parallel programming. In C++17 they added parallel algorithms:

https://cukic.co/2018/03/29/cxx17-and-parallel-algorithms-in-stl/
https://devblogs.microsoft.com/cppblog/using-c17-parallel-algorithms-for-better-performance/

NVIDIA then provides a GPU implementation of those:

https://developer.nvidia.com/blog/accelerating-standard-c-with-gpus-using-stdpar/

And here is an example of how it can work in in practice:

https://twitter.com/blelbach/status/1321544029415170048

The general idea is that C++ now has parallel building blocks, that are standardized, so when people use them, their code can run pretty well on modern hardware in a multiplatform manner.

I am well aware that many Fortran compilers automatically parallelize many Fortran constructs also. But the end user experience is just not the same.

@certik, thanks for the pointers to the parallel STL procedures, but these arenā€™t language features to my mind. Still, the idea of parallelized building blocks is a useful one and Iā€™d encourage those of you working on a Fortran STL to keep this in mind. One might take a look at Intelā€™s Threading Building Blocks (TBB) for Intel C++ - this was not entirely successful for Intel, but it was an interesting approach. An open-source version is available.

@pmk, there have been no issues with DO CONCURRENT acknowledged. I am aware that you have a dissenting opinion on this topic. The compiler developers who already successfully parallelize DO CONCURRENT donā€™t seem to agree with you that there is a problem.

Necessary fixes and features to keep standard Fortran relevant in HPC have been specified by the community and then been dismissed or ignored.

Evidence and specifics, please. I am also getting a bit weary of references to ā€œthe communityā€, which often seems to be a reference to the same handful of people or a particular web site.

[Edit: I meant Intel TBB, not IPP.]

@sblionel you are correct that it is in the C++ standard library, so that would correspond to the Fortranā€™s stdlib efforts. In C++ the standard library is part of the standard itself, which is something we could also consider in a few years.

I presume the issue re: DO CONCURRENT and locality mentioned by @pmk is the one discussed here: [J3] [EXTERNAL] Questions about DO CONCURRENT and locality

I admit I too was totally shocked by the response(s) on the J3 mailing list, for there is nothing in the current Fortran standard (Fortran 2018) that permits, ā€œSomeone who prefers the OpenMP ā€œpedal-to-the-metal and no brakesā€ approach can just add DEFAULT(SHARED) to the DO CONCURRENT statementā€. And there was no one who should be in the know (when it comes to HPC and parallelism and DO CONCURRENT, I donā€™t know enough) who questioned it or challenged it or followed up on it.

Itā€™s only Fortran language and its practitioners who suffer due to this.

@pmk and anyone interested in this,

By examples, do you mean the one in the paper - https://j3-fortran.org/doc/year/19/19-134.txt - which appears to be:

SUBROUTINE FOO(N, A, B, T, K, L)
  IMPLICIT NONE
  INTEGER, INTENT(IN) :: N, K(N), L(N)
  REAL, INTENT(IN) :: A(N)
  REAL, INTENT(OUT) :: B(N)
  REAL, INTENT(INOUT) :: T(N)
  INTEGER :: J
  DO CONCURRENT (J=1:N)
    T(K(J)) = A(J)
    B(J) = T(L(J))
  END DO
END SUBROUTINE FOO

and a similar one posted at this comp,lang.fortran thread?

  subroutine foo(a,b,c,ix,iy,n)
    integer, intent(in) :: n, ix(*), iy(*)
    real, intent(inout) :: a(*), b(*)
    real, intent(in) :: c(*)
    do concurrent (j=1:n)
      b(ix(j)) = c(j)
      a(j) = b(iy(j))
    end do
  end subroutine

  program main
    real :: a(2), b(1) = [1.0], c(2) = [2.0, 3.0]
    integer :: ix(2) = [1, 1], iy(2) = [1, 1]
    call foo(a, b, c, ix, iy, 2)
    print *, sum(a)
  end program

Just adding my thoughts after reading this excellent discussion.

  1. Highly agree with this. And just as a side note, quite honestly I find the idea of discrete GPUs with separate memory and separate instruction set very unappealing. Iā€™m rooting for Fujitsuā€™s ARM chips or heterogenous multicores like Appleā€™s M1. However, I would say that itā€™s important that Fortran have constructs that cleanly abstract the parallelism inherent in present and foreseeable hardware, so that compiler developers can have a reasonably easy time writing the optimizer. This is not as hard as it sounds and Fortran does this to a great degree already. Array operations already abstract CPU SIMD architectures, corrays already abstract distributed architectures, and DO CONCURRENT has great potential in abstracting the parallelism inherent in more complex SIMD architectures such as GPUs.

  2. I definitely understand @certik 's point that C++ is a language that allows building things like Kokkos and for practitioners that can make a big difference. But I canā€™t agree with the suggestion that it be practical to write a useable Fortran compiler using Fortran alone. As a C++ user I often see that many of C++'s weaknesses with respect to high performance scientific computing (lack of restriction, lack universal domain specific features, need to fall back on SIMD intrinsics or write explicit CUDA, horrific metaprogramming, forced necessity of compile time computing) come from the fact that it is so general-purpose. And many of the features that allow C++ to build a library like Kokkos come from the fact that a lot of APIs (such as OpenCL or CUDA) were designed explicitly to interface with C or C++. Itā€™s not strictly a language advantage itself, but what vendors have chosen to interface. If C++ has any advantage at all itā€™s in the generics, but thatā€™s of course another discussion.

  3. Do concurrent should, in my opinion, be a forced parallelization construct that displaces most of the functionality of OpenACC.

4 Likes

@edsterjo thank you for your comments. I think I agree with your points, they are generally what most people agree upon in the above discussion. Just to clarify your second point, I advocate that Fortran should be the best at high performance numerical computing, not for writing compilers.

2 Likes

Sorry, I should have linked to the comment that made the point about the conceivability of compiler bootstrap

@edsterjo,

Iā€™ve tried to make clear upfront, in that comment and the subsequent one (here) , the point about bootstrapping and/or trying to author much-needed tooling for developer ecosystem in Fortran-only is only as a ā€œtheoreticalā€ test, for any such attempt will also illustrate the monumental challenges with developing modern libraries in Fortran, be it any aspect of high-performance computing.

2 Likes