Poll: refactoring a chunk of legacy code

Ah, that makes sense. Sorry for the out of date knowledge.

march=native is the relevant flag.
Note that with this option, running the code is only possible on an architecture that is very similar to the one used for compilation.
It should therefore not be used for distributing binaries, but when compiling for an HPC system it is a good choice.
This shows that even ‘runtime’ is not a single number to compare.

1 Like

Thank you all for participating in the poll! I’ve now closed it.

The two takeaways which I find important are:

  • Choice A, which is also the code produced by an automatic transformation tool is not the human favorite in this specific case. Instead choice B was the preferred approach.
  • Legacy code can not be “translated” verbatim. Statements which might seem needed, may in fact be workarounds for various constraints of the language or specific compilers.
2 Likes

I think that’s overly optimistic. I’d say any language that uses AD has gotten it by either doing itself or interfacing c/c++.

For anyone interested in using the JPL math77 library for nonlinear least squares problems, the following notice may be of interest.

The source file compjplJ.f90 is a driver for solving the 27 nonlinear least squares problems that are presented at the NIST site using the MATH77 routine DNLSGU. This file is to be compiled and linked with the subroutine files

dnlsgu.f, divset.f, drnsg.f, drn2g.f, dq7rfh.f, idam.f, amach.f

of the MATH77 library from JPL.

To run one of the NIST problems, for example, Nelson, enter the command

echo Nelson.dat | ./compjplJ

The 27 data files and the driver source file can be obtained in a Zip file from the cloud .

2 Likes

I agree with @nncarlson (and others in that perspective) 100%.
Old, “field” numerical analysts (not the ones that settled in theory only) had to deal with language/resources constraints unthinkable today. This made them better programmers, inventing creative methods to bypass the issues. Extensive reuse of the exact same portion of a matrix just to save a few bytes comes in mind - back when every bit was precious. But there are many other tricks they used, because they had to.

The result was a (usually huge) “FOOPACK” - Netlib is full of them. A spaghetti hell that even the slightest, easy on paper modification may have dramatic consequences, and therefore needs extensive testing in each single step. But said FOOPACK has passed the test of time with countless users for decades, and proved to work very well. I don’t see a reason to modify such a monstrous but extremely effective thing - other than the joy of the challenge. Not to mention even if you manage to “modernize” it, in 20-30 years from now it will probably be considered legacy code as well.

Fortran is notorious for backwards compatibility, so wrapping venerable code into a modern Fortran module to make it easy-to-use is the best solution in my opinion. It’s easier and safer.

Thanks @mecej4 for the pointer to the MATH77 routines and the NIST problems.

Concerning MINPACK passing the test of time, perhaps the algorithm yes, but the Fortran code I’m not so sure. There have been at least five extensive rewrites of the source code by C programmers who were not happy calling the Fortran code (levmar, GSL, C/C++ MINPACK, lmfit, C MPFIT) and further rewrites in Java and IDL.

It’s a shame so much effort goes into translation when we could be working on better algorithms or solving problems.

1 Like

Well, everybody does what they want with their time, but rewriting Fortran legacy code in C is pointless in my opinion. I’m not sure at all the rewritten C code would actually be more readable than the Fortran legacy code - and even if it would, the difference should be so small that doesn’t justify the effort of rewriting. Not to mention you have way better options.
C is a language that hates anything related to Mathematics with a passion. Arrays don’t exist, they are just pointers. If you need multidimensional arrays… my ancient Casio handheld computer with equally ancient Basic can do it better than C - the code, at least, is better. C functions don’t really exist either, they are - you guessed it - pointers. I could go on for ages, but let’s just say all you get is pointers, and you need them for even the simplest things, just because the language doesn’t have any better way. And because it doesn’t have another way, C lets you do crazy things with pointers, meaning memory leaks are the easiest thing in the world to happen, with the slightest human mistake.
Don’t get me wrong, pointers are not evil - but the way you use them in C is. Fortran has pointers, and they have their uses; in some cases they are an invaluable tool. But you don’t need them for everything, because Fortran gives you other, better options. For the same reason Fortran pointers are more restricted, so you can’t shoot your feet with them, or at least you will have to try hard.
“But compilers and operating systems are written in C”, someone would argue. Of course they are. The language is a high level assembly (not a high level programming language). Therefore it is more close to the hardware, thus it’s easier to implement “low level” libraries. But for Numerical Analysis code, using C is… a masochistic act. There is no better way to decribe such a task. Rewriting Fortran legacy code in C is even worse. You can do it, but you don’t want to. Well apparently some did want to do it, and I am guessing they were C hardcore fans.

Now, there is also C++. They tried to make a better C with it - and in my opinion, they failed miserably. The language is bloated, and yet you still need to reinvent the wheel by writing templates for basically everything, just because they introduced arrays and other features in the worst way possible. Rewriting Fortran legacy code in C++ would result a more readable code, sure, but why on Earth would someone want to do that in C++ when it can be done in modern Fortran? At least the new language itself won’t stand in their way, they will only need to deal with the legacy spaghetti - and the resulting rewritten code would be way more readable than that in C++.
I do use C and C++ - when I have to. They both have their uses. But I will never use them for refactoring legacy code. I can’t tell anything about IDL because I never used it.

As for Java… let me say just this: every language with a standard way to declare a variable being the new statement should be prohibited by law :smiley:. And that was just a quick example. The language is so bad that I see no reason to say anything else. Rewriting venerable code in… Java? Well, people never cease to amaze me with their choices…

2 Likes

If a programming language has been widely used for decades, it must have some good features, and I think we should avoid bashing other languages here. C and C++ programmers did write our operating systems and compilers.

We can add quite a few more to that list. A few:

  • (1/2)*2 = 0

  • (4.0/3.0-1.0)*3.0-1.0 = 1.19209290E-07

  • it takes work to make I*I equal to -1

  • “print *,(-8.0)**(1.0/3.0)” outputs “NaN”

  • in algebra, a/bc means a/(product of b and c), but in Fortran “a/b*c” means (a/b)*c

  • “PRINT *, TYPE(GOD)” does not output “REAL”

Correct. I already stated that, and tried to explain why they picked those languages for that specific purpose. I never said C or C++ are useless. I said they are a bad choice if you intend to develop Numerical Analysis applications, and an even worse choice for refactoring legacy code. You can do it, but there are way better options, so you shouldn’t. And yes, I am aware people do that anyway; in fact, it’s very trendy nowadays. Nevertheless it’s very easy to realize C/C++ were not designed with Mathematics/Numerical Analysis in mind, therefore not the languages of choice for the task. I wouldn’t call that “bashing” but rather, a fact.
At any rate, my intention was not to bash C/C++. I use them myself. But for what they are designed for - and that’s not Numerical Analysis.

As for the counterpoints (and I could even argue at least some of them aren’t counterpoints), I am sure you realize there is no perfect language. The point is, for each one of those counterpoints I could mention a long list of features needed that are either completely absent in C or they are implemented in a poor way in C++. The lack of arrays in C and the constant need for templates to teach C++ elementary numerical operations is just the tip of the iceberg. So you need to reinvent the wheel, and I fail to see how such languages would make a sane choice for refactoring legacy Fortran code.

I think no single item of your list (except, maybe, the last one :slight_smile: ) is Fortran-specific. Integer division - same in C and probably in many other languages. Numerical problems with accuracy - same thing, cf.

Python 3.9.10 (main, Jan 15 2022, 11:48:00)
[Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print((4.0/3.0-1.0)*3.0-1.0)
-2.220446049250313e-16

Exponentiation - C does not even have an operator for that. a/b*c - in C the left-to-right associativity of */ operators is firmly set in the standard. And so on.

2 Likes

IMO, such debates are mainly fueled by language biases. Reading the introduction to the paper “Linguistic Relativity and Programming Languages” can be helpful to try and rid oneself of these.

Here’s a quote from the foreword of the book “Using OpenMP” which kind of explains the reason for what you call “bloat”,

The great paradox of programming languages is that as they become popular and evolve, they become more complex and easier to use, at the same time. Simplicity comes from specialization - features that make somethings very natural and easy, e.g. parallelizing a recursive function in OpenMP. Complexity has two sources: needs of expert users for more control to extract the best performance, and just the sheer size of the language as more interdependent features are added. With success comes complexity! Over time, fewer and fewer people understand the fullness and intricacies of each language, yet the language fills the needs of many.

1 Like

Yes, the so-called language “wars” are meaningless and best avoided, the last subthread here veers in that direction and it will be better to pull back.

+1.

Done well, refactoring will provide you with several benefits:

  1. you’ll be surprised and even shocked at the “premature” optimizations - remember Knuth and his warnings about the root of all evil! - that have remained hidden for so long that are suddenly discovered by the refactoring initiative, not to mention bugs and other numerical inaccuracies. This is especially true for almost every “legacy” FORTRAN 77-based codebase,
  2. the opportunities to steer the codebase toward thread-safety and vectorization and enhanced extensibility toward consumption in parallel codes,
  3. easier support and maintenance with greatly increased collaboration on many fronts.

A hearty Godspeed in this effort,

3 Likes

I found that paper somewhat biased, to be honest - if not quite biased. But anyway, what I wanted to say is that refactoring legacy code is a task that (1) I don’t think it is worth the effort since a wrapping module with a “modern” interface is not that hard to do, and avoids the many issues of refactoring spaghetti code. (2) Even if someone goes for it, rewriting in C/C++ is definitely not the best choice.
I don’t expect everyone to agree, and that’s perfectly fine.

1 Like

Are these simply jokes?