Cost of modernising or rewriting Fortran codebase(s)

What I am asking: I’m asking for two things:

  1. Pointers to documents in the public domain (e.g., blog posts, videos, peer-reviewed articles, technical reports) that provide information on the estimated or actual costs - e.g., effort, time, $$, risk - of either a) modernising an existing Fortran codebase, or b) rewriting the codebase to another language. Preferably, I’m looking for specific cases or examples.
  2. Whether anyone is willing so say, off the record (so maybe contact me direct, if you’d prefer…), what their own experiences are, with their projects.

Examples of what I have already found:
I have the LANL 2023 Report that evaluates risks and costs, but does so very broadly, doesn’t provide specifics, and has a very short reference section. I’m also aware of the earlier discussion on this site about that report. I’d like to be able to complement the thoughts shared in that discussion with specific cases, if possible. Charles Ferenbaugh kindly pointed me to one of his talks on Best Practices with Long-Lived Code which describes a particular case, and provides an indication of duration (years) and effort. Leman et al. describe their 25-year journey, including two major rewrites, to move from Fortran to C/C++. Tucked away in the supplement is this quote: “…Rosetta still contains machine-translated Fortran legacy code in low-level libraries from 15 years ago, which is difficult to read, maintain, or replace.” Finally, as another example, the British Computer Society shared the results of a survey on the benefits of continuing Fortran standardisation which refers to costs and benefits, though again that’s in broad terms. I’m hoping to find other cases, and - ideally - more specific information on the costs.

Why I’m asking: I ask because I am trying to put together an evidence-based argument on the costs the Fortran-based scientific community (in the UK and internationally) confronts, or particular projects within that community confront, when deciding how to proceed in the long-term with their Fortran codebases. Whichever way one turns (commit to Fortran, migrate to something else, integrate/interface), there’s the prospect of a lot of “pain”, now and in the future, in effort, duration, $$, uncertainty. I am trying to put together an evidence-based argument about that “pain”.

Thanks, Austen

EDIT:

I have found this article, Making the Fortran-To-C Transition: How painful is it really?, reporting a case study on the transition from Fortran to C/C++. I appreciate people may not have access to the IEEE database. It’s a relatively small-scale codebase. They estimate the conversion took about one year effort.

1 Like

The cost of modernizing a Fortran code base will depend on how much you want to improve the code. If modernize just means convert to free source form and compile cleanly with say gfortran -std=f2018, that could be possible with a commercial tool that costs a few $K. If you want all procedures in modules, to replace COMMON blocks with module variables, to use assumed-shape array arguments, to use optional arguments to reduce the number of required arguments, to used derived types and classes etc., that could require manual effort and cost much more.

The Lawrence Livermore National Laboratory has funded research in automatic translation of Fortran to C++, with preprint here associated with GitHub project Fortran2Cpp. Probably LANL did a study finding that this was less expensive than manual translation.

Archaeologic is a Fortran consultancy that you could contact.

Thanks for your prompt reply, @Beliavsky. The “it depends” is something I want to take account of, hence the case studies to get some sense of variation. In terms of what I am looking for, the Fortran2Cpp example is in itself interesting, because it doesn’t (seem to) consider the cost-benefit argument, e.g., what’s the cost of developing Fortran2CPP vs the benefit it might bring in saving development time/effort; and what’s the cost argument that motivates the Fortran2CPP project in the first place? Then, for Archaelogic, thanks for that suggestion. I’m already in touch with Brad - he came to our workshop in September last year.

Thanks again.

Hard to measure things when they are undergoing a phase change. I would be very surprised that these costs are not upended or that they become irrelevant as some point depending on how you perceive what AI and quantum computing will do to the art of programming and when.

You might be trying to measure an ice cube in a frying pan, as Grandma used to say. It will not work well and might end up in you getting burned :wink:

2 Likes
2 Likes

A nice thing about Fortran is the ability to modernize the code incrementally, since old and new Fortran can be mixed together. I did this with a very large Fortran program.

It was done with many passes while still developing the program at the same time. It is much easier to ensure correctness if one works with small steps.

The style was to remove one obsolescent and deleted feature at a time. I didn’t have any arithmetic ifs or Hollerith strings to deal with but I did have ENTRY statements. Eliminating COMMON blocks and modules was another pass that was also done one at a time.

Converting from fixed to free format was done one file at a time using compiler options. I progressively switched to using : in array syntax and 1D array slices. Over time I added the INTENT attribute for call arguments - which I really like.

There are many more details I could go into, but suffice it to say, it was relatively painless and I never felt like I was loosing control.

I chose to stay with Fortran to avoid the risks with moving to another language, for performance reasons, and for code longevity. Fortran has been around for almost 70 years and odds are it will be around for another 70 years.

5 Likes

I have more than a decade of experience in leading Fortran modernization projects. I generally find modernization a hard sell in its own right, but one can often do a tremendous amount of modernization in support of other specific objectives. Common related objectives include porting to a new platform, parallelization, and handing off to a new set of developers (often because a hero programmer is retiring). There can be interesting domino effects that can send one down a deep and expensive yet potentially worthwhile rabbit hole:

  1. Parallelization is much easier if procedures are pure. For example, compilers can automatically parallelize do concurrent and any procedure called inside do concurrent must be pure.
  2. Pure functions are especially helpful for clarifying data dependencies.
  3. Because functions can produce only one result, the conversion to pure functions can lead to defining derived type results to encapsulate more than one function result via object components.
  4. Once one has derived types, object-oriented programming becomes attractive for various reasons (encapsulation, abstraction, etc.).

So the domino effect is that the one starts out aiming to do parallel programming and along the way adopts functional programming patterns, which lead one to object-oriented programming. And this all just along one particular modernization path that might have little to do with various other forms of modernization such as improved C-interoperability.

Here are a few examples from my experience (none of which involved open-source code so there are no public references):

  1. A 5K-line code modernization effort (driven by a pending retirement) in which we started modernizing and ultimately we decide it was better to write a new code from the ground up.
  2. A 52K-line code modernization (motivated both by a retirement and a desire to parallelize the code) in which there was a separate, ground-up rewrite happening simultaneously that was eventually abandoned in favor of incremental modernization.
  3. A 750K-line modernization effort that is still in the estimating phase and will be driven primarily by a desire to parallelize the code incrementally. In this case, it’s quite clear that starting from scratch is impractical given the decades of validation and insight about the application that are built into the code.

Feel free to contact me offline for more details and thoughts on how to develop the estimate.

4 Likes

This is how R programmers do it, with functions returning objects containing the estimated parameters of a statistical model.

1 Like

Our experience echoes that of @Rouson. We have, for example, modernised, refactored and bug-checked:

  • many aerospace codes, typically about 10K lines;
  • a weather/climate code of 470K lines,
  • an agricultural code, 550K lines
  • a point of sale stock-control code (In Fortran!), 650K lines
  • an electricity power distribution code, 1170K lines

As I wrote in an earlier post, the first thing to do is to establish a regression test. If you are going to change half a million lines of code you need to know that nothing gets broken.

I recommend that developers check the facilities provided by the available software tools. Our own tool, fpt, and Polyhedron’s PlusFort will convert to free form, reformat the code to a user-defined standard, and carry out systematic changes like conversion of COMMON blocks to modules. There are other tools which perform at least some of the same functions: @Beliavsky published a list on this forum. fpt will also help in setting up regression tests - see for example: fpt Reference: INSERT RUN-TIME TRACE . These tools are usually free for academic use (ours are) and cost the equivalent of only a few days of developers’ time for industry.

7 Likes

Maybe this article about defect patterns (or lack thereof) in the NAG library is of some use?

It is worth noting that in some years ago NAG has added a Python API (nAG Library for Python — NAG Library for Python 30.3.0.0 documentation). From what I’ve read, the software remains implemented in Fortran, and the C, C++, and Python interfaces are just bindings.

3 Likes

That’s great. Thanks!