https://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-23-23992
TLDR (from the executive summary)
- It is very likely that we will be unable to staff Fortran projects with top-rate computer scientists and computer engineers.
- There is an even chance that we will be unable to staff Fortran projects with top-rate computational scientists and physicists.
- There is an even chance continued maintenance of Fortran codes will lead to expensive human or financial maintenance costs.
- It is very unlikely that codes that rely on Fortran will have poor performance on future CPU technologies.
- It is likely that codes that rely on Fortran will have poor performance for GPU technologies.
- It is very likely that Fortran will preclude effective use of important advances in computing technology.
- There is an even chance that Fortran will inhibit introduction of new features or physics that can be introduced with other languages.
This paper seems to project a feeling that we
have no control over the future of our
computing environment. LLNL specs systems
that are good for C++ and ignore Fortran,
because they made that switch twenty years
ago. The UK’s Archer2 system was spec’d to
run Fortran codes, because over 75% of their
cycles are Fortran. We could choose to invest
in computers and software environments that
are good for our codes and our mission. $10M
put towards getting [company name] to
improve their compiler is nothing compared
to the costs of replacing Fortran in our code
base.
This is a report written by some of my former colleagues at LANL. I asked them to correct a citation from our “State of Fortran” paper (update: they cited the introduction, not the conclusion, so it is actually accurate citation). I spent hours discussing with LANL managers about exactly the issues touched in the paper. In 2019 I asked them, give us 3 years, before making the decision on Fortran. And to their credit, they did. As you know, I gave it my absolute best, and I described some of our efforts in Resurrecting Fortran, and the “State of Fortran” paper and in the LFortran effort (which is doing great btw!). Changing culture about Fortran is hard, and I think I made a non-trivial dent in the direction of the ship, but not enough. I am of course sad to see this, as LANL might be the last National Lab with mission codes in Fortran. Sandia and LLNL already moved pretty much fully to C++.
One can argue with details, but the arguments in the paper are to be taken seriously. In fact, if I was in their shoes, I would maybe have to make the same decision to move from Fortran to C++, it’s a complex decision that involves hiring, maintaining, mission success, tax payers money, and technical reasons. If anyone here doubts what is written in the report, you can simply ask me, and I am happy to support their argument. Yes, of course if I was in their shoes I would investigate an option to fund Fortran tooling. But at some point it is just not possible to justify it to the sponsors and the people involved, that is just the reality.
A simple technical reason is that it’s always best to use just one language for a big project. This is based on my personal experience, but many programmers agree (although not all). The minute you start mixing C++ and Python, or Fortran and C++, you are suddenly requiring people to understand both languages, and unless they are clearly separated (which they are typically not in these computational codes), you need to debug and develop in both languages. That’s a major problem, even for me. I know all three languages really well. And yet you can notice that I do not mix them in any of my projects.
To fix it, either move all into Fortran or all into C++ or all into Python. Don’t mix. If you have to mix, separate into independent modules or layers.
Anyway, I could write pages about this problem. I’ve worked at LANL for almost a decade, I know the people well, they are all great. I’ve discussed this at length with anybody willing to listen and they do. I have been really trying to discuss this very problem with the J3 committee, including my very first meeting in 2019. I am afraid I was not very successful there. I am not even talking about the solution (see the next paragraph), just acknowledging that there is a problem.
Now the solution: Not all is lost for Fortran. We have to be doing exactly what we are doing, and get new people to use Fortran and use it for projects and work on improving the basic tooling (compilers, fpm, etc.). That’s it. We have to build the community from the ground up, and that will fix Fortran, because it changes the culture and people see active projects, successful compilers, lots of energy in the community and then it’s much easier to justify using Fortran instead of other languages for big projects.
I found this report unsurprising. Nice to see reviewers’ counterpoints included. There’s a lot of confirmation bias here, of course. It would be more interesting to see reports like this written across DOE, Navy, NASA, and NOAA labs. I expect that the estimations of risk would wildly vary depending on whether a lab is developing and running large Fortran projects or not.
Your comments about performance of Fortran codes
on CPUs or GPUs are overly broad. First, the
performance on either processor type will likely
depend on the investment discussed in my first point.
If it is high enough, then neither processor type needs
to have poor performance for Fortran applications.
More importantly, the performance of Fortran codes
will heavily depend on the language features that they
use. We can likely support good performance on
either processor type for codes that largely restrict
themselves to Fortran 95, with use of only carefully
considered features of later versions of the standard
(e.g., the C interface stuff). Codes that insist on using
the latest Fortran standard (aggressively? anything
beyond the carefully selected features?) will require
that investment to be substantial
Can you give some examples of newer (beyond f95) fortran standard features that are inefficient or that perform poorly?
DO CONCURRENT
is the example called out by them that has been talked about on discourse as well.
See this thread, though it mentions one specific processor, the ailment is broader:
Sadly, I’m aware of numerous issues with F90+ features delivering terrible results. It’s ridiculous that explicit DO loops is still the only reliable way to get performance.
Array notation and data parallel intrinsics are flaky in some compilers, both in terms of performance and the unnecessary creation of intermediates.
There is a well-known vendor compiler that will segfault on B = B + transpose(A)
when A
and B
are bigger than 2000x2000, unless the user adds special flags.
Very well said. I hold exactly the same opinion, which is pretty much criticized here. See the discussions under the following old threads:
I am repeatedly told that I am using Fortran and the compilers in the wrong way, it is insensible to expect intrinsic procedures to be performant out of the box, and it is fine to have SEGFAULT as the default behavior.
Nevertheless, I still believe that the built-in support for automatic arrays and array manipulation (addition, multiplication, slicing, broadcasting, …) is one of the core strengths of Fortran. Array manipulation takes a high weight in scientific computing. If this strength is well-developed and exploited, Fortran can become the lingua franca of scientific computing (again), and an ideal language for templating numerical algorithms.
Still, I am determined to use in my PRIMA project (a package for solving general nonlinear optimization problems without using derivatives) only automatic arrays and “matrix-vector procedures” instead of loops whenever possible. Because, according to my very humble and limited understanding of pure and computational mathematics, this is the correct way of presenting / coding numerical algorithms, and this is the future. (Thinking about ChatGPT, you will realize that the boundary between presenting and coding is blurred.)
No, you are correct here and the people who tell you not to use newer, more expressive language features that should be easier to optimize have Stockholm syndrome (where compiler developers are the kidnappers).
I’ll note that some compilers do a great job with array notation and intrinsics. The NVIDIA Fortran compiler maps the =
operator to CUDA memcpy and offloads TRANSPOSE
, MATMUL
and many others to GPUs by mapping to a CUTENSOR back-end.
Cray’s compiler also does a good job with such things, although they don’t have GPU support for them at the moment.
BabelStream Fortran has array-based implementations (with OpenACC kernels and OpenMP workshare, too) so that folks can measure the difference versus loop-based versions. It’s certainly not the most complicated benchmark out there, but it’s very easy to reason about.
- Fortran ports by jeffhammond · Pull Request #135 · UoB-HPC/BabelStream · GitHub
- Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream — University of Bristol (open access PDF)
- Benchmarking Fortran DO CONCURRENT on CPUs and GPUs Using BabelStream | IEEE Conference Publication | IEEE Xplore
While the citation may be technically valid, I still find it intellectually dishonest. The thesis of the paper is that Fortran is seeing a resurgence, and yet they quoted it in a way to imply that we believe Fortran is dead. To read an article talking about our efforts in supporting Fortran, ignore those efforts, and then quote the same paper to encourage abandoning the language is an egregious violation of academic ethics (IMO).
I’ve never criticized you either. I agree with you. In my opinion, especially array operations in Fortran should be highly performing, and if they are not, I consider it a bug in a compiler.
Completely agree.
Great to know! This is exactly what I meant. Programmers should code the algorithms in the most natural way and let compilers handle everything else.
Thank you @certik . I firmly believe that Fortran would have a much brighter future if all compiler developers / vendors saw things in the same way as you. I look forward to using LFortran in my project.
There is a well-known vendor compiler that will segfault on
B = B + transpose(A)
whenA
andB
are bigger than 2000x2000, unless the user adds special flags.
With fortran array operations, there are two opposing goals. One is to have the very best performance, and the other is to have robust code that always runs and gives the correct results, regardless of array size. In this particular example, I’m guessing that the problem is the allocation of an intermediate on the stack instead of the heap. So for small matrix cases, the code will produce the best performance, but it fails for large matrices where the stack space is exceeded. If the compiler always allocated on the heap, then it would not perform as well for the small matrix cases (because heap allocation requires more garbage collection overhead than stack allocation).
Another reasonable alternative would be for the compiler to first attempt stack allocation, and then fall back to heap allocation when necessary. Then the programmer would see good performance and robust code, with the cost of some minimal overhead for the size test.
Actually, this particular example probably should not require any memory allocation at all, the compiler should just switch array indices, but that reminds me of another thing that is missing in fortran.
I think there should be a “shallow transpose” operator in fortran that does exactly this index switch thing. It would be like a pointer assignment.
AT => shallow_transpose(A)
No data should be moved, it should just be done within the array metadata. After this statement, AT(i,j) should reference the same memory as A(j,i). I think this operation should be allowed in expressions, and perhaps also on both the left and right hand sides of the equal sign. So the above operation could be written as
B = B + shallow_transpose(A)
and the programmer would have some high-level control over whether or not an intermediate work array is allocated at all.
I forget who did this, but someone once wrote the C code for the gfortran compiler array structure that performed this shallow_transpose() operation. So we know for certain that this can be done.
Nothing requires TRANSPOSE
to materialize a temporary array. It can be implemented in a shallow manner today, and defer the materialization until assignment.
None of the following require temporaries, if the compiler does enough semantic analysis to understand that TRANSPOSE
can return a reference to a view.
A = TRANSPOSE(A)
A = A + TRANSPOSE(A)
B = TRANSPOSE(A)
B = B + TRANSPOSE(A)
In any case, Numpy offers both shallow and deep transposes, because it’s less connected to the compiler than Fortran intrinsics.
# this actually forms the transpose of A
# B += numpy.transpose(A)
# this only uses the transpose _view_ of A
B += A.T
One of the coauthors, Galen M. Shipman, recently created a repo Fortran Tests
A set of fortran tests for modern fortran
I wonder if the performance of compilers on these tests influenced the conclusions of the report.
The tests don’t look that performance heavy but more focused on semantics.
Some elements of that report match with the AMS report on the impact of technology on the weather enterprise workforce - #3 by milancurcic, posted previously. In particular the difficulty of finding Fortran programmers, and the problems of running Fortran on GPUs.
I wish, their speculation
It is also possible that as the pool of Fortran developers continues to decrease, the demand for this skill set on legacy code bases across the industry will remain flat for quite some time, meaning increased competition for the relatively few developers with deep Fortran expertise. [emphasis mine] This has the potential to further erode retention and our ability to compete on salary.
which echoes @milancurcic’s takeaway from the AMS report:
- Fortran programmers will likely be better paid as they become more scarce and as the industry takes up a larger fraction of the weather enterprise.
would become true. But in reality, I don’t see it happening.
Judging be other reports such as this one from 2014 (ASCAC WORKFORCE SUBCOMMITTEE LETTER), the government labs in the US face bigger recruitment issues than just a lack of Fortran talent:
[…] In particular, the findings reveal that:
- All large DOE national laboratories face workforce recruitment and retention challenges in the fields within Computing Sciences that are relevant to their mission (termed ASCR-related Computing Sciences in the following findings and the recommendations), including Algorithms (both numerical and non-numerical); Applied Mathematics; Data Analysis, Management and Visualization; Cybersecurity; Software Engineering and High Performance Software Environments; and High Performance Computer Systems.
- Insufficient educational opportunities are available at academic institutions in the ASCR-related Computing Sciences that are most relevant to the DOE mission.
- There is a growing national demand for graduates in ASCR-related Computing Sciences that far exceeds the supply from academic institutions. Future projections indicate an increasing workforce gap and a continued underrepresentation of minorities and females in the workforce unless there is an intervention.
- The exemplary DOE Computational Science Graduate Fellowship (CSGF) program, deemed highly effective in every one of multiple reviews, is uniquely structured and positioned to help provide the future workforce with the interdisciplinary knowledge, motivation, and experiences necessary for contributing to the DOE mission.
- The DOE laboratories have individually developed measures to help recruitment and retention, yet more can be done at the national level to amplify and extend the effectiveness of their locally developed programs
Many sectors of the population are significantly underrepresented in the Computing Sciences.
According to the Taulbee data, in 2014 women comprise a low and declining percentage of computing
graduates, with 17.2% of Computer Science and 18% of all computing doctorates. Less than 2% of computational science doctorates are awarded to Hispanic or African-American students. The fraction of degrees awarded to non-US citizens continues to climb, reaching over 58% of all Computing Science doctoral degrees (Table 5 in the Appendix gives examples). Similar demographic data at the career level reveals a workforce that is mostly male and mostly white.
Concerning the GPU portability aspects, this Twitter thread sharing work presented at SYCLcon contains some food for thought: https://twitter.com/simonmcs/status/1648976667468001281/photo/1 The following two slides are the relevant ones (I hope the authors, Sergi Siso, Andrew Porter, and Rupert Ford don’t mind me posting this here for the sake of our discussion):