Cost of modernising or rewriting Fortran codebase(s)

arainer · January 28, 2025, 12:50pm

What I am asking: I’m asking for two things:

Pointers to documents in the public domain (e.g., blog posts, videos, peer-reviewed articles, technical reports) that provide information on the estimated or actual costs - e.g., effort, time, $$, risk - of either a) modernising an existing Fortran codebase, or b) rewriting the codebase to another language. Preferably, I’m looking for specific cases or examples.
Whether anyone is willing so say, off the record (so maybe contact me direct, if you’d prefer…), what their own experiences are, with their projects.

Examples of what I have already found:
I have the LANL 2023 Report that evaluates risks and costs, but does so very broadly, doesn’t provide specifics, and has a very short reference section. I’m also aware of the earlier discussion on this site about that report. I’d like to be able to complement the thoughts shared in that discussion with specific cases, if possible. Charles Ferenbaugh kindly pointed me to one of his talks on Best Practices with Long-Lived Code which describes a particular case, and provides an indication of duration (years) and effort. Leman et al. describe their 25-year journey, including two major rewrites, to move from Fortran to C/C++. Tucked away in the supplement is this quote: “…Rosetta still contains machine-translated Fortran legacy code in low-level libraries from 15 years ago, which is difficult to read, maintain, or replace.” Finally, as another example, the British Computer Society shared the results of a survey on the benefits of continuing Fortran standardisation which refers to costs and benefits, though again that’s in broad terms. I’m hoping to find other cases, and - ideally - more specific information on the costs.

Why I’m asking: I ask because I am trying to put together an evidence-based argument on the costs the Fortran-based scientific community (in the UK and internationally) confronts, or particular projects within that community confront, when deciding how to proceed in the long-term with their Fortran codebases. Whichever way one turns (commit to Fortran, migrate to something else, integrate/interface), there’s the prospect of a lot of “pain”, now and in the future, in effort, duration, $$, uncertainty. I am trying to put together an evidence-based argument about that “pain”.

Thanks, Austen

EDIT:

I have found this article, Making the Fortran-To-C Transition: How painful is it really?, reporting a case study on the transition from Fortran to C/C++. I appreciate people may not have access to the IEEE database. It’s a relatively small-scale codebase. They estimate the conversion took about one year effort.

Beliavsky · January 28, 2025, 1:14pm

The cost of modernizing a Fortran code base will depend on how much you want to improve the code. If modernize just means convert to free source form and compile cleanly with say gfortran -std=f2018, that could be possible with a commercial tool that costs a few $K. If you want all procedures in modules, to replace COMMON blocks with module variables, to use assumed-shape array arguments, to use optional arguments to reduce the number of required arguments, to used derived types and classes etc., that could require manual effort and cost much more.

The Lawrence Livermore National Laboratory has funded research in automatic translation of Fortran to C++, with preprint here associated with GitHub project Fortran2Cpp. Probably LANL did a study finding that this was less expensive than manual translation.

Archaeologic is a Fortran consultancy that you could contact.

arainer · January 28, 2025, 1:46pm

Thanks for your prompt reply, @Beliavsky. The “it depends” is something I want to take account of, hence the case studies to get some sense of variation. In terms of what I am looking for, the Fortran2Cpp example is in itself interesting, because it doesn’t (seem to) consider the cost-benefit argument, e.g., what’s the cost of developing Fortran2CPP vs the benefit it might bring in saving development time/effort; and what’s the cost argument that motivates the Fortran2CPP project in the first place? Then, for Archaelogic, thanks for that suggestion. I’m already in touch with Brad - he came to our workshop in September last year.

Thanks again.

urbanjost · January 28, 2025, 4:27pm

Hard to measure things when they are undergoing a phase change. I would be very surprised that these costs are not upended or that they become irrelevant as some point depending on how you perceive what AI and quantum computing will do to the art of programming and when.

You might be trying to measure an ice cube in a frying pan, as Grandma used to say. It will not work well and might end up in you getting burned

Federchen · January 28, 2025, 8:48pm

github.com/scientific-python/faster-scientific-python-ideas

Rewrite LAPACK so it's not in F77

opened 01:32PM - 13 Aug 24 UTC

itamarst

From @ilayn, who can hopefully expand: > We must (full emphasis on must) port… LAPACK out of F77 for fast computing as scientific community (of all languages). LAPACK codebase is treated as the scripture, and for a good reason because the folks who wrote the algorithms have immense insight and expertise. But that does not mean they coded the most optimal code and also F77 does not lend itself to anything modern hardware offers. You get only what gfortran offers for the loop optimization. While the algorithms are rock solid, they have no way other than writing for loops as if there is no tomorrow. I know this because I spent reading similar code for the good chunk of last year due to [META: FORTRAN Code inventory · Issue #18566 · scipy/scipy · GitHub](https://github.com/scipy/scipy/issues/18566). LAPACK routines are often older than SciPy’s *PACK libraries.

github.com/Reference-LAPACK/lapack

Should we allow C++?

opened 02:06PM - 23 Feb 24 UTC

thijssteel

I have been working on a pet project. What is would like to do is introduce some… C++ code in LAPACK, with a matrix class to represent the matrices, not just a pointer. This is not meant as a way for people to have a nice API to call LAPACK (there are plenty of libraries that provide nice wrappers), but to make things easier for developers in the future. It would allow for comprehensive bounds checks that even work for subarray (via asserts of course, we should disable those in a production build). The way I want to achieve this is by mixing C++ and Fortran, so that we don't have to suffer through the immense task it would be to translate all of LAPACK at once. We could just stick to C++ for new algorithms or for routines that need to be reworked/debugged (for example, dgesdd, see #672) A proof of concept implementation is available at https://github.com/thijssteel/lapackv4 I appreciate all feedback.

danpettet · January 31, 2025, 10:03pm

A nice thing about Fortran is the ability to modernize the code incrementally, since old and new Fortran can be mixed together. I did this with a very large Fortran program.

It was done with many passes while still developing the program at the same time. It is much easier to ensure correctness if one works with small steps.

The style was to remove one obsolescent and deleted feature at a time. I didn’t have any arithmetic ifs or Hollerith strings to deal with but I did have ENTRY statements. Eliminating COMMON blocks and modules was another pass that was also done one at a time.

Converting from fixed to free format was done one file at a time using compiler options. I progressively switched to using : in array syntax and 1D array slices. Over time I added the INTENT attribute for call arguments - which I really like.

There are many more details I could go into, but suffice it to say, it was relatively painless and I never felt like I was loosing control.

I chose to stay with Fortran to avoid the risks with moving to another language, for performance reasons, and for code longevity. Fortran has been around for almost 70 years and odds are it will be around for another 70 years.

rouson · February 1, 2025, 10:29am

I have more than a decade of experience in leading Fortran modernization projects. I generally find modernization a hard sell in its own right, but one can often do a tremendous amount of modernization in support of other specific objectives. Common related objectives include porting to a new platform, parallelization, and handing off to a new set of developers (often because a hero programmer is retiring). There can be interesting domino effects that can send one down a deep and expensive yet potentially worthwhile rabbit hole:

Parallelization is much easier if procedures are pure. For example, compilers can automatically parallelize do concurrent and any procedure called inside do concurrent must be pure.
Pure functions are especially helpful for clarifying data dependencies.
Because functions can produce only one result, the conversion to pure functions can lead to defining derived type results to encapsulate more than one function result via object components.
Once one has derived types, object-oriented programming becomes attractive for various reasons (encapsulation, abstraction, etc.).

So the domino effect is that the one starts out aiming to do parallel programming and along the way adopts functional programming patterns, which lead one to object-oriented programming. And this all just along one particular modernization path that might have little to do with various other forms of modernization such as improved C-interoperability.

Here are a few examples from my experience (none of which involved open-source code so there are no public references):

A 5K-line code modernization effort (driven by a pending retirement) in which we started modernizing and ultimately we decide it was better to write a new code from the ground up.
A 52K-line code modernization (motivated both by a retirement and a desire to parallelize the code) in which there was a separate, ground-up rewrite happening simultaneously that was eventually abandoned in favor of incremental modernization.
A 750K-line modernization effort that is still in the estimating phase and will be driven primarily by a desire to parallelize the code incrementally. In this case, it’s quite clear that starting from scratch is impractical given the decades of validation and insight about the application that are built into the code.

Feel free to contact me offline for more details and thoughts on how to develop the estimate.

Beliavsky · February 1, 2025, 1:39pm

This is how R programmers do it, with functions returning objects containing the estimated parameters of a statistical model.

Jcollins · February 3, 2025, 9:47am

Our experience echoes that of @Rouson. We have, for example, modernised, refactored and bug-checked:

many aerospace codes, typically about 10K lines;
a weather/climate code of 470K lines,
an agricultural code, 550K lines
a point of sale stock-control code (In Fortran!), 650K lines
an electricity power distribution code, 1170K lines

As I wrote in an earlier post, the first thing to do is to establish a regression test. If you are going to change half a million lines of code you need to know that nothing gets broken.

I recommend that developers check the facilities provided by the available software tools. Our own tool, fpt, and Polyhedron’s PlusFort will convert to free form, reformat the code to a user-defined standard, and carry out systematic changes like conversion of COMMON blocks to modules. There are other tools which perform at least some of the same functions: @Beliavsky published a list on this forum. fpt will also help in setting up regression tests - see for example: fpt Reference: INSERT RUN-TIME TRACE . These tools are usually free for academic use (ours are) and cost the equivalent of only a few days of developers’ time for industry.

ivanpribec · February 17, 2025, 10:13pm

Maybe this article about defect patterns (or lack thereof) in the NAG library is of some use?

It is worth noting that in some years ago NAG has added a Python API (nAG Library for Python — NAG Library for Python 30.3.0.0 documentation). From what I’ve read, the software remains implemented in Fortran, and the C, C++, and Python interfaces are just bindings.

arainer · February 18, 2025, 8:18am

That’s great. Thanks!

Topic		Replies	Views
Poll: refactoring a chunk of legacy code Poll	56	2850	February 28, 2022
Resistance to modernization	108	5447	September 7, 2022
Why is C used in many optimization libraries? Help	17	914	January 30, 2022
NEK for computational fluid dynamics moving to C++?	58	4443	February 8, 2022
Scientists are using artificial intelligence and large language models to rewrite old code in modern languages	82	2337	May 23, 2025

Cost of modernising or rewriting Fortran codebase(s)

Related topics