Poll: refactoring a chunk of legacy code

jacobwilliams · February 16, 2022, 3:10am

In my view, it is totally irrelevant if the old codes still work or can be compiled. Nobody wants to work with them or maintain them. Nobody wants to fix their bugs or add the latest algorithms (numerical methods did not stop progressing in the 1980s). The latest generations of domain experts do not want to write FORTRAN 77 code. Young people do not want to edit FORTRAN 77 code. So, wrapping it in a modern interface doesn’t help. It has to be modernized so as to provide some kind of reasonably modern ecosystem that people want to use and add to. It was totally irrelevant that fftpack still worked. SciPy depreciated it and replaced it with C++. Decades of trust is irrelevant if nobody wants to use it. The new libraries written in languages like Julia don’t have decades of trust, but they have enthusiastic users and developers. That’s what we need

Arjen · February 16, 2022, 7:57am

FWIW, I am planning to work on the interpolation routines (2D, …) from Netlib. And I am also fascinated by the implementation of the special functions Mostly hobbyism.

plevold · February 16, 2022, 8:31am

I once heard somebody say that “legacy code is what pays the bills” and I find that to be very true. We can argue indefinitely about poor readability, maintainability, performance and sometimes even the correctness of the code, but the fact that the code is still in production means there’s value there. Indeed, the existence of legacy code “paying the bills” is what let’s us experiment and hopefully find better ways of building robust software.

Generally I tend to agree with @nncarlson that there should be very good reasons for initiating a code refactoring because there might very easily be unintended consequences. The question I usually ask myself is is there a need to refactor this code right now, or can it be postponed?

If I’m able to build a modern interface around the code then I’ll happily leave computed gotos and other remnants of the past as they are and rather focus on evolving the application or library while also building a robust test suite.

The moment a fundamental change in the legacy code is required to fix a bug or add a new feature it’s time for modernization though, because I just ain’t writing fixed form arithmetic if statement ever!

I’ll also admit that sometimes a rewrite of some arcane piece of code that is indeed much older than myself could just become too tempting…

jacobwilliams · February 16, 2022, 3:01pm

Yes, I agree. Now is not the right time to be refactoring these libraries. The right time was 25 years ago!

everythingfunctional · February 16, 2022, 3:36pm

The “modernization” of these libraries should be seen as a knowledge recovery process, not a tool refurbishing process. As others have said, the tool still functions, so no refurbishing is necessary. But the knowledge contained within is indecipherable, and that’s the really valuable bit. I’ll always applaud those willing to slog through these old codes and attempt to make their wisdom accessible to the rest of us. Especially since it’s also kind of my day job.

This is the point I think. It’s not that people need to understand a piece of code in order to use it, but there is a tendency to avoid that where possible. If I know that I could understand a piece of code and fix it myself if needed, there’s much less risk in me depending on it.

FortranFan · February 16, 2022, 5:24pm

@jacobwilliams / @ivanpribec , just curious re: MINPACK. Would you know what is the current state in terms of its test suite? If present, any idea how comprehensive is it?

ivanpribec · February 16, 2022, 8:10pm

There are a few collections of test problems we are looking to resurrect:

Rather than focus solely on refactoring the legacy code, I’d prefer to see a full relaunch of development. There’s been 40 years of both algorithm and software development in the meantime. For example the Ceres solver created by Google for bundle adjustment and other tasks in computer vision, scores more than twice more points on the NIST collection of nonlinear regression problems compared to MINPACK. It can also handle sparse problems and use automatic differentiation to calculate Jacobian matrices. There are also new solvers (developed in collaboration with NAG) in Python such as DFO-LS. These new efforts are gradually making MINPACK obsolete. It’s still in SciPy at the moment, but it’s only a matter of time before they kick it out. The SciPy paper (cited by now more than 8000 times) says:

For implementing new functionality, Python is the still the language of choice. If Python performance is an issue, then we prefer the use of Cython followed by C, C++ or Fortran (in that order). The main motivation for this is maintainability: Cython has the highest abstraction level, and most Python developers will understand it. C is also widely known, and easier for the current core development team to manage than C++ and especially Fortran.

This reminds me of a poem from a Slovene impressionist:

Čas beži in ne odlaša, staro izgubi veljavo,
kar je novo to je pravo, taka je usoda naša.

(Dragotin Kette)

Here’s my humble translation:

Time flees without delay and the old loses validity,
only what is new has desirability, that is an inevitability.

mecej4 · February 17, 2022, 1:26pm

Often, the expression being fitted to data is linear in a subset of the fit coefficients. If so, the problem is a Separable Nonlinear Least Squares problem; for example, see Golub and Peyrera’s paper. The JPL Math library, from Netlib (MATH77/mathc90) or @jacobwilliams’s Github with the same contents, is a comprehensive and high quality Fortran 77 numerical methods library. The routines DNLSFU/DNLSGU solve SNLS problems. The advantages of using a SNLS solver instead of a NLS solver such as Minpack:

Fewer nonlinear parameters to guess initial values for, smaller dimensions of parameter space
Evaluations of residuals and Jacobian are cheaper than in NLS

For the NIST NLS problem set, DNLSFU solves the 27 X 2 cases (27 problems, two starting points for each) quite well, giving a mean LRE of 10.1, which is just as good (slightly better, for some problems) as the results from the Ceres solver. If DNLSGU is used (which requires the Jacobians to be computed analytically), the LRE attains the “perfect” 11 for all 27 problems except Lanczos2, for which the two LREs are 10 and 10.

The JPL Fortran 77 routines contain a few instances where the Fortran argument aliasing rules are violated. I have fixes for the violations in the DNLSFU/DNLSGU and ancillary routines.

FortranFan · February 17, 2022, 5:19pm

That’s an admirable goal.

I agree, it will indeed be helpful for the Fortran Community to both refactor and “relaunch” the venerable FORTRAN libraries into the modern world first and foremost for its own benefit.

In this regard, it may prove a true “blessing in disguise” the other ecosystems such as in Python are considering migration away from Fortran. It will lead to actual “data” to compare and contrast with Fortran on how those other efforts pan out in reality. Given certain time and support from the Fortran standard committee with certain new features that help enhance both productivity and performance and also strong backing from Fortran implementations with compiler facilities for such modern Fortran features, chances are rather high good modern Fortran libraries that also include C interfaces for interoperability will prove entirely simpler, superior, and enduring overall for the world of scientific and technical computing than any efforts elsewhere, especially those that don’t have standards and/or employ squiggly-brackets for its block constructs!

I applaud the efforts and vision of @jacobwilliams here.

Looking at the calendar on the wall and any reflection of the past, basic attention to the present, and any anticipation of the future, "see FORTRAN legacy code, create a thorough and ever-growing test suite and refactor the hell out of it for modern needs and consumption is simple ethos contrary to what-I-think-is-entirely-questionable “if it ain’t broke, don’t fix it” saying that I reckon will serve both the domain of numerical computing and Fortran very well.

jacobwilliams · February 17, 2022, 6:50pm

Yes, those MATH77 routines are totally in-scope for the new Minpack! Let’s add them.

But…they need to be run through PlusFORT/SPAG 3 or 4 times. That code is a prime example of amazing algorithms that are locked away and hidden by the terrible limitations of FORTRAN 77.

jacobwilliams · February 17, 2022, 7:02pm

Yep, my intent is that we are restarting development where it left off 30 years ago. Refactoring the existing code is really just step one. Adding new methods is totally in-scope.

Not sure about including Fortran interfaces to external libraries though. Maybe that makes more sense as a separate library? (e.g., a Fortran interface to Ceres as a separate library but really that is something that could even be included with Ceres… other C libs have Fortran interfaces that come with them…it makes sense to keep those together in the same project). IDK. The nice thing about Minpack is that it’s pure Fortran and it dead-easy to compile and include in your Fortran projects. Once you start bringing in other libs it can get very complicated quickly.

oscardssmith · February 17, 2022, 9:24pm

One thing that makes this code hard to justify porting to modernized Fortran is that Fortran is no longer the easiest language to implement high performance math in. For a simple example, the typical way to evaluate a polynomial using Horner’s scheme written in Fortran is about 2x slower than optimal since arrays in Fortran are mutable, so the compiler can’t unroll the whole computation. As such, libraries like GitHub - heltonmc/Bessels.jl: Bessel's functions written in Julia are 2x faster than the Fortran equivilents (in this case, AMOS) even when they use the same algorithm. The only solution to this is unroll all your code by hand, which is tedious and error prone, or to use a preprocessor to generate the code for you (at which point you aren’t using Fortran).

shahmoradi · February 17, 2022, 9:59pm

Yes, it’s all a matter of opinions. If you ask C++ programmers, Fortran never was the easiest language. In fact, there is a new language almost every year that claims to be the easiest of all because there is no solid ground like the rules of nature to compare programming languages. Bessel functions have been part of the Fortran standard since 2008. A link to the claimed benchmark results and all details would have been good and fair.

jacobwilliams · February 17, 2022, 11:22pm

Julia is great. But it’s a fairly new language without a standard, and only one implementation. Which is fine, nothing wrong with that, but does that mean it makes all other programming languages unnecessary? No. If you want a standardized, compiled language with multiple implementations, you have three choices: C, C++, and Fortran. I maintain that modern Fortran is superior for math than the other two. Every few years a new language comes along and everybody converts all the FORTRAN 77 to that language. Let’s try something else, let’s modernize the FORTRAN 77, develop that further, and see what we can do with that. Maybe that will save people some work in 5 years when yet another language comes along to replace everything.

oscardssmith · February 17, 2022, 11:44pm

Interesting, this was an example that I got from Steven Johnson (the author of FFTW and a bunch of other high performance software). JuliaCon 2019 | Keynote: Professor Steven G. Johnson - YouTube, and I’ve seen this performance difference in practice when re-implimenting a variety of special functions. It’s possible that SciPy and SpecialFunctions.jl aren’t compiling it with the right flags (although that would be surprising), and I can’t think of any other possible cause of the difference.

fortran4r · February 18, 2022, 12:01am

The marketing of Julia actually works. Many young guys in my field believe that Julia is consistently faster than Fortran.

jacobwilliams · February 18, 2022, 12:15am

We need to get us some of that marketing.

lmiq · February 18, 2022, 12:18am

It wouldn’t be bad for Fortran if there were some simpler compilation flags for production. I actually learnt at the Julia forum (thanks C. Elrod) how to compile some of my Fortran code to match the performance I was getting with Julia. If it wasn’t for that comparison I could well have spent another decade running code at suboptimal performance without never knowing it.

ivanpribec · February 18, 2022, 12:28am

Hey Jacob, I don’t think we should be calling Ceres in Minpack. We can however take a look at what improvements their Levenberg-Marquadt implementation has. Also we can have a callback interface which offers automatic differentiation using for example a library like Fazang.

ivanpribec · February 18, 2022, 12:50am

The libraries Steven Johnson refers to were written in the 70s. They use data statements to initialize the polynomial coefficients and no intent attributes.

You can write a pretty decent variadic macro function using fypp:

#:def fma(a,b,c)
ieee_fma(${a}$,${b}$,${c}$)
#:enddef

#:def polyval(num,a,b,*pos)
  #:set res = fma(a,num,b)
  #:if len(pos) > 0
    #:set res = polyval(num,res,pos[0],*pos[1:])
  #:endif
  $:res
#:enddef

It’s not nearly as complex as Steven Johnson makes it sound in his talk. Granted, it’s not part of the language standard, but that’s not a showstopper.

Here’s an erfinv implementation I wrote with the help of the macro: List of special mathematical functions to include · Issue #179 · fortran-lang/stdlib · GitHub
Here’s the Julia code for the same function that I based the Fortran implementation upon: SpecialFunctions.jl/erf.jl at 6b686c0d6544a1e0ca59b627f22ea5559ba03e40 · JuliaMath/SpecialFunctions.jl · GitHub

Given the sample @kargl just showed, I’m not even convinced there’s much need for my macro.

(Just as a sidenote, the macro was inspired by Johnson’s Julia talk. I wrote it as a side project during my PhD time at the Chair of Brewing and Beverage Technology. I’ve been using it in a Fortran module which evaluates the thermophysical properties of moist air.)

Topic		Replies	Views
Tools for refactoring legacy codes	6	1446	July 15, 2022
Replacing legacy code with Julia	15	988	July 8, 2022
Tips on modernizing old fortran code? Help	18	2456	June 8, 2023
Modernizing Fortran Code: Transitioning to Fortran 90+ Help	4	1108	September 6, 2023
The speed of if	8	1313	October 21, 2021

Poll: refactoring a chunk of legacy code

Related topics