A few months ago I participated on a panel organized by the American Meteorological Society to determine and document the impact of technology on the weather enterprise (sum of academia, government, and industry) workforce. My input was focused on the use of Fortran and transition from in-house to cloud computing. The report is now out:
Discussion of the use of Fortran, relevant here, is in Section 3c: Changing tools in the WWC workforce: Programming Languages (highlights mine):
The programming language Fortran has historically been the building block of computing within the enterprise. For decades, WWC scientists have used Fortran to develop meteorological, ocean, and climate models, often in conjunction with HPC. As a result, a significant amount of legacy code, particularly within the public sector, exists in Fortran. However, while Fortran is primarily used for numerical and scientific computing, researchers are increasingly turning to the general-purpose programming language Python to handle data of all types and interface with a variety of software applications. Expanded GPU capabilities enable Python to be readily used for HPC; additionally, it is one of the most popular languages for work with AI and ML. As Python becomes increasingly popular within and beyond the enterprise, the enterprise will likely need to reevaluate its reliance on Fortran. Improved systemic support of Python might not only help the existing WWC workforce perform more efficiently but also increase the pool of available workers that the enterprise can draw on. Conversely, if the enterprise fails to keep up with shifts in preferred programming languages, employers may have an increasingly difficult time finding new employees that can adeptly work with existing Fortran-based programs. This balance between new and outdated languages will likely repeat in the future as programming languages further evolve, indicating a need for the workforce to be flexible to changes.
This paragraph very much reflects my experience as well. In 2010 I ported all my Fortran code that I used for analysis and plotting to Python, with Fortran remaining for number-crunching. On a few occasions I implemented some analysis in Python first and then ported to Fortran for speed and parallelism. Meanwhile in the last 15 years I have seen a steady decrease in Fortran programmers and increase in Python programmers. A significant and steady trend has also been in developing machine learning surrogates to replace parts of deterministic weather, ocean, and climate models. Developing/training such surrogates is almost exclusively done in Python, and the forward-propagation of the model is usually applied in the deterministic model (Fortran). And then there’s the ever-increasing share of GPUs in HPC systems where existing Fortran code has not been easy to run on.
Fortran seems to remain in maintenance and development of the number-crunching aspects of the weather enterprise; the outer (higher-level) layers of the weather enterprise onion are being peeled off and ported to other languages, mainly Python.
New and young workforce are looking for more versatile and general tools like Python.
Significant trends in HPC to adopt GPUs and for weather research to develop ML models shifts focus and workforce away from Fortran.
It has been becoming increasingly more difficult to hire Fortran programmers; opposite is true for Python.
Fortran programmers will likely be better paid as they become more scarce and as the industry takes up a larger fraction of the weather enterprise.
I think it’s an analogy and a metaphor. I don’t personally use it and I don’t strongly object to it.
Edit: I just read the “…in the weather model field” part, my eyes/brain skipped it on the first read. I would definitely disagree with that analogy. Perhaps the analogy works, and I wouldn’t object to it, for the Fortran ecosystem and applications as a whole. But strictly for the weather, ocean, and climate models (number crunching), it’s a definite no, at least for the next 10 years or so. We’ll see how the ML research and applications progress–it’s possible that they will replace the deterministic models entirely in operations, and the deterministic model may become more limited to research over prediction.
Second edit: I should also be clear that in the above paragraph I mean it specifically for the number-crunching parts of the weather modeling. Post-processing, analysis, and delivery of data has mostly moved to other languages. And there are new applications (e.g. ML) that are emerging and not being written in Fortran. So the overall Fortran share is possibly not shrinking in the absolute sense, but rapidly shrinking in the relative sense.
Yes, I meant strictly for the weather model field.
It’s interesting, to me, your first post and third post are contradicting each other.
My understanding of your first post is that “Fortran remain in maintenance”, “new workforce looking for Python”, “GPU shifts workforce away from Fortran”, etc. I agree with that. In my head I read this as “not doing well” (one can use any number of phrases, but they all kind of mean the same thing to me).
My understanding of your second post that by “definitely disagreeing” (with the “life support = not doing well”) means “fortran is doing well” (in weather), because the opposite of “not doing well” is “doing well”.
I can now see your second edit that sheds some light on this. That it is doing well for numerics, but not post processing. However, GPU is numerics and as you said (and I agree), the workforce is moving away from Fortran for GPUs, and that is for numerics. At least that was my understanding of your first post, which does agree with my experience in another field.
Anyway, I just want to really understand your experience, that is all. I like to think in black and white a lot, it helps me with making decisions. In my mind Fortran (as a whole) is not doing well and that’s why I am trying to help, so that it is doing well. That is my black and white approach.
Right. I think the disconnect is in the words we use and how we perceive them. To me “life support” is considerably stronger than “not doing well”, and “doing well” itself could be defined to mean anything on the spectrum. To me “life support” means you need to put consistent effort and energy to keep it alive. For NWP models (number crunching) it’s more a status quo kind of situation–you would need to put constant effort and energy to make Fortran go away. Too much depends on it right now.
In other words, and perhaps a bit more poetically: Aging scientists and Fortran programmers are retiring; new and young scientists are coming. They inherit the Fortran machine; they’re not happy to use it, and haven’t figured out how to replace it.
I observe the same in academia: Many students have experience with Python, almost no one knows Fortran. However, it took me approx. 10 year to understand how to write fast code with numpy because it must not contain any loop. Now it is fast but unreadable. In Fortran, even a beginner can write fast code. Many Python ‘programmers’ are only able to glue high level libraries together.
And they are used to the comfort that tools like Jupyter Lab, pytest, sphinx, etc. offer. One of Python’s biggest assets is its ecosystem.
The people who write these articles don’t seem to look further than surface metrics and technology press headlines.
No one sensible is going to implement any computationally complex model in Python. Python is just a glue that connects the workhorses implemented in Fortran, C, or C++.
The real question: is there a benefit to transitioning towards C++? I say no. Fortran is easier to learn for those not involved in systems level programming. It is better to learn Fortran, then C++ (if necessary, which will probably be the case in “the real world.”).
More needs to be done to introduce Modern Fortran at the level of high school to the first 2 years of college. A look at the AMS (American Mathematical Society) and MAA (Mathematical Association of America) curriculum guides and open source texts provides Fortran proponents an opportunity to introduce numerical computational thinking to a population that might not be exposed to it.
The Lfortran compiler with compilation to WASM solves most of the pressing problems. They just need to be developed.
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science – the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs
I know little directly about WWC Enterprise, but elsewhere going by that very definition about “consistent effort and energy to keep it alive”, Fortran indeed appears to be on “life support”!
First, consider all the effort and energy to keep the Fortran going with the standard revisions and the work by the committee with Fortran 90, 95, 2003, 2008, 2018, and now 202X. Then by all the compiler developers to conform to all these standard updates, particularly the FOSS volunteers with gfortran. It’s an incalculable amount, countless hours and attention.
Then consider all the time and contributions by various Fortran enthusiasts over the years at different forums, particular comp.lang.fortran, Intel Fortran, StackOverflow, and now here. Their feedback, troubleshooting support and debugging and optimization insights, etc. kept many a Fortran codebase competitive performance-wise. Again this is monumental service,
Now take into consideration everything that has gone into LFortran by @certik and his vision and also with J3-Fortran GitHub side and Fortran-lang and stdlib. It’s priceless.
Fortran language standard, following 1995 revision published in '97, could have gone the way of the C standard since C99 and been in “maintenance mode” . Thankfully Fortran didn’t, what was going to work for C wasn’t going to do it for Fortran.
In spite of all of above, there remains tremendous and persistent doubt with continued and further investment by powers-that-be (management, budget $$s, etc.) into Fortran codebases.
But for all of above listed energy and effort, those making the decisions would have easily pulled the plug on the few remaining codebases for the message would have been more clear, that the path of Fortran had reached a dead-end and those codebases too would have migrated away to other paradigms (C++, etc.).
I agree with @milancurcic that Fortran will be in wide-spread use in weather modeling for quite some time. Weather and climate models tend to be developed by government organizations, sometimes in partnership with academic or other non-governmental organizations (i.e. the US National Center for Atmospheric Research). Almost all models are written and actively developed in Fortran. Most Fortran is not particularly “modern.”
New architectures may be the change vector. I’m aware of one concrete effort to move to heavily-templated C++. The motivation for this move was performance portability on a wide range of architectures and especially GPUs. The ICON model (developed in Europe by consortium led by the German Weather Service and the Max Planck Institute for Meteorology) has, as I understand it, less well-developed plans for using domain specific languages, driven by performance portability and a desire for a shorter distance between prototyping and production. Some domain-specific libraries are also written in C++ although these may expose Fortran interfaces as well.
Weather forecasting centers in particular have to be extremely pragmatic. They have narrow time windows in which to make forecasts on which people’s lives not infrequently depend. The systems have been polished to a high shine by years of investment and they are quite conservative about making change. At the same time they’re planning for the future and know that power limitations will be the main constraint.
Fortran as a language seems really nice for performance portability, and it should be our goal as a community to ensure Fortran compilers can take advantage of it, so that people are not forced to move to C++ for portability.
@milancurcic wondered about earth sciences. Not claiming to represent all of them, but I can comment about one area.
As elsewhere, while some new developments are the exception, the trends for development in electromagnetic geophysical prospecting point away from Fortran towards Python, C (++), and Julia. Matlab is popular in some quarters as well. Where legacy Fortran remains critical, it is likely to be wrapped in Python.
Fortran’s many well-documented advantages for numerics pale when compared to (eg) Python as a known language with an active user community.
This is another crucial point. Same occurs in astronomy. There are quite many standalone codes and libraries written in 1970s and 1980s, all in pre-F90. Some of them updated/maintained later but never converted to what we call Modern Fortran. The only exception that comes to my mind is the Numerical Recipes by Press et al., converted to F90 in 1996, now discontinued (the 3rd release (2007) is C++ only). So the majority of those codes do not benefit from the new features at all. I am afraid that the effort to modernize their Fortran code would be comparable to translating it to C(++), Julia or whatever the young generation of programmers is using nowadays.
I should add that in my personal opinion, it is extremely unhealthy for Fortran to be only “doing well” by the fact that it is inherited by newcomers and that they want to move away, but don’t know how. I personally call this “lock in”. They are locked into Fortran.
If that is indeed the situation in weather modeling (I don’t have personal experience there, I am simply going by what @milancurcic wrote), then that is indeed identical to other fields which I do have personal experience in. And in those, yes, Fortran is not doing well. That doesn’t mean it can disappear in the next 5 years, because it’s hard to migrate millions of lines of code (you are “locked in”). But there are efforts underway to do exactly that, by using old code where it make sense, but to do new developments in C++.
I have not found Fortran at all nice for performance portability in my context. I am the lead developer of a code that computes the flow of radiation (light) through the atmosphere. All but the most idealized models of the atmosphere rely on such a code and it was my hope to make something accurate, fast, and flexible. Users interact with Fortran 2008 classes but anything but trivial computations is done in kernels (Fortran 90, with array sizes explicitly stated and C interfaces).
The scientific problem poses some interesting computational challenges - a small amount of data describing the system state is expanded into a spectral dimension with large (100s) extent; some of the calculations are embarrassingly parallel (atomic) but other require loop carries in one spatial dimension; the final output is a partial or complete reduction over the spectral space. Parts of the problem are computationally intensive but memory access (e.g. to interpolate in tables of empirical data) seems to be the main limiter.
Supporting hardware flexibility, i.e. CPU and GPU architectures, has been a … challenge. GPU support comes through two sets of compiler directives, one for OpenACC and another for OpenMP (exploiting the GPU offload). Managing device/host memory allocation and transferring is one burden.
The larger burden is that the layout of the problem needs to be different for the two architectures. On memory-bound CPUs it’s most efficient to do the complete problem (invoke a series of kernels) for one spectral point at a time and reduce at the end; for the GPUs one wants to do the most work possible and so the kernels work on all three dimensions at a time. In practice this means maintaining two sets of kernels that differ only in the looping structure.
I haven’t looked in detail to see how the Julia or the templated C++ implementations