Poll: refactoring a chunk of legacy code

ivanpribec · February 15, 2022, 5:24am

Consider the problem of refactoring the following chunk of legacy fixed-form code:

      nm1 = n - 1
      if (nm1 .lt. 1) go to 70
      do 60 j = 1, nm1
c       ...
   60 continue
   70 continue

The easiest approach is to invert the if condition:

     ! Approach A
      nm1 = n - 1
      if (nm1 >= 1) then
         do j = 1, nm1
            ! ...
         end do
      end if

The downside is the logic gets more convoluted. With a bit more thought, we may produce the following

      ! Approach B
      if (n > 1) then
         do j = 1, n - 1
            ! ...
         end do
      end if

Another option is to use a block:

      ! Approach C
      block
         if (n < 2) exit
         do j = 1, n - 1
            ! ...
         end do
      end block

Which approach would you pick?

A
B
C
Keep the original fixed-form code
None of the above (explain in the comments)

0 voters

nncarlson · February 15, 2022, 5:46am

Is the “packs” discussion the background for this poll? I know how incredibly tempting (and satisfying) it is to rip into some old legacy code and modernize it. But I know from painful experience that even seemingly simple changes can have unintended consequences. Unless one is intimately familiar with the code in question, I’m not smart enough, and neither are you (not you specifically, though I’d include you ) to make such changes without having a thorough test suite to confirm no ill-effect of the changes. This especially so for numerical things, with tests to cover the whole range of input space. I’d bet such test suites have been long lost for many of the legacy libraries. I think the best course of action – as unsatisfying as it is – is to leave the code as is and wrap it in some modern interface.

ivanpribec · February 15, 2022, 6:45am

The code in question does come from a classic “PACK”. In general I agree with everything you’ve written, but am more open towards modifying such codes in a minimally intrusive way.

The issue that bothers me is that “modernization” is subjective. Of course in the long term we’d like to move away from fixed-form and remove obsolescent language constructs, but using a naive source transformation tool may actually ruin the expressiveness of the code. While the snippets above are hopefully isomorphic (and hence would have the same behavior for any inputs) they “read” differently.

Ideally we’d have a tool to write such semantic patches (in this specific case you might even call it simply a stylistic patch). There is a tool for C called Coccinelle which provides pattern matching and transformation using a special semantic patch language. You can imagine this kind of as sed but aware of the language it’s modifying.

Arjen · February 15, 2022, 7:40am

My reply to this particular code snippet is that the condition is actually superfluous and, if I may guess, only introduced because of FORTRAN 66’s one-pass do-loops (the condition was checked at the end). So instead, you could do:

do j = 1, n-1
    ...
enddo

IF n is 1 or smaller, the number of iterations will be be zero. No check needed.

Arjen · February 15, 2022, 7:43am

One, hopefully non-controversial, step in the modernisation of such packages would be to set up a comprehensive test suite, especially if the original one has been lost in the mists of time. A modern interface is definitely called for and after that I guess the opinions will diverge :).

interkosmos · February 15, 2022, 7:47am

From Recommendations To Write (Slightly More) Readable And (Thus) Robust Code:

Exit early to avoid waterfall code. Especially for longer code blocks, this minimizes cognitive overhead.

The block statement is very handy to exit sections early, and allows to easily handle error codes. But your example might be to simple to illustrate the benefits.

ivanpribec · February 15, 2022, 8:06am

Excellent catch!

Do you know of any references which explain this (odd) behavior? The Modernizing Old Fortran on Fortran Wiki has no mention of this.

ivanpribec · February 15, 2022, 8:22am

Do static analysis tools like plusFORT or fpt have the option to identify this “ill” pattern?

cc @apple3feet, @Jcollins

Jcollins · February 15, 2022, 8:39am

[quote=“ivanpribec, post:1, topic:2796”]

      nm1 = n - 1
      if (nm1 .lt. 1) go to 70
      do 60 j = 1, nm1

May I suggest:

      nm1 = n - 1
      do  j = 1, nm1
c       ...
      enddo

If nm1 is less than 1 the loop will exit immediately. There is no need for the additional IF statement.
At present fpt will change the DO 60 – 60 CONTINUE to DO - ENDDO (if you ask it to) but it won’t eliminate the IF statement.

John

Arjen · February 15, 2022, 8:53am

I found a short description of FORTRAN 66 on the Fortran Wiki. It helpfully points to a more formal document: ANSI X 3.9 1966 Fortran 66 : American National Standards Institute : Free Download, Borrow, and Streaming : Internet Archive

I remember seeing compiler options regarding do-loops, that is the main reason I know of this feature

ivanpribec · February 15, 2022, 9:01am

Thanks. The line which would warrant the if block in the past, and has been relaxed in later standards is:

At time of execution of the DO statement, m1, m2, and m3 must be greater than zero.

mecej4 · February 15, 2022, 9:33am

Please see the description of how a DO loop operated in 1960s CDC Fortran (page I.5.9, PDF page number 70). Not only is the at-least-one-trip DO feature described, but also another curious feature: the DO index variable became undefined after normal completion of the DO loop! Furthermore, the variables/expressions m1, m2, m3 in DO I = m1,m2,m3 were all required to be positive. There was provision for the “extended range” of the DO loop – a jump out of the loop, followed by execution of some statements, and then a jump back into the loop.

Some of these seemingly odd features may be related to the architecture of the CDC: data registers were 60 bit (integers and reals), index registers were only 18 bits. There were no separate instructions for integer arithmetic.

JohnCampbell · February 15, 2022, 10:07am

I agree with other comments;
When “refactoring”, I judge success by the minimum number of character changes in the code. It can be very hard to test the new code, as the development test cases are often lost, while precision changes can mask the results. I remember problems with conversions to 8087 and then to SIMD/SSE/AVX.

Legacy DO loops have a lot of issues, as @mecej4 identifies. Because of this you may have to “refactor” to achieve the same result, typically using DO ; and coding the way m1,m2,m3 worked on the development compiler, although which compiler was rarely documented!
The if test appears unnecessary in this case, as “modern/F77?” DO has a zero-trip option. Perhaps this is a good omen, as the code appears to stop one-trip from occuring.
One-trip default is a problem with very old legacy code, as was the ability to change the values of m1 and m2 during the loop.
I don’t recall m3 must be +ve, although I didn’t use CDC Fortran till 1975, and rarely used m3, typically using “I” as a loop counter and so calculating an “m3 modified i” within the loop.
Loop unrolling can be a nightmare with very old code.

You need to research the original developer’s style before making any changes and always keep the original code to compare the changes.

Arjen · February 15, 2022, 10:28am

Fascinating, from a archeological point of view! But that makes the urgency to modernise the code even larger, if the code continues to be used. With all the caveats already mentioned in this thread of course.

jacobwilliams · February 15, 2022, 1:32pm

That’s just a recipe for these codes to continue to fade into obscurity and be replaced by modern versions written in other programming languages. We can modernize these codes. We are just as good as the dudes from the 80s. We need not be stuck with their GOTOs until the end of time. Absolutely, refactoring should be rigorously tested. We can do that too.

urbanjost · February 15, 2022, 2:07pm

And today’s code is tomorrow’s legacy code so the greatest take-away is that unit tests are critical. An automated capture and compare instrumentation tool might be the most plausible tool for creating a base set of tests. Do any vendors or projects supply one? I could not find one.
Basically, they capture all the arguments at entry and return. Exceptionally good ones track global variables (COMMON, etc…) as well. And they let you compare results between runs.

For reference, plusFORT defaults (it has lots of options, including many for case) produced

      IMPLICIT NONE
      INTEGER j , n , nm1
      nm1 = n - 1
      IF ( nm1>=1 ) THEN
         DO j = 1 , nm1
!       ...
         ENDDO
      ENDIF
      END PROGRAM AA0001

mecej4 · February 15, 2022, 4:03pm

It is noteworthy that the segment of code that @ivanpribec made the topic of this thread occurs in Minpack-1. In one refactored version of Minpack-1, lines 793-803, we still see vestiges of this one-trip-do legacy, as well as the non-standard and unnecessary intrinsic DFLOAT, and see the intrinsic SUM rendered unavailable:

            sum = zero
            nm1 = n - 1
            if (nm1 >= 2) then
                do j = 2, nm1
                    sum = sum + dfloat(j)*x(j)
                end do
            end if
            do i = 1, m
                Fvec(i) = dfloat(i - 1)*sum - one
            end do
            Fvec(m) = -one

A manual refactoring into modern Fortran could have produced the more readable version:

sumj = sum([(j*x(j),j=2, n-1)])
fvec = [-1.0, ((i-1)*sumj - 1.0, i = 2, m-1), -1.0]

Note that this version rhymes better with the mathematical description from a journal article :

nncarlson · February 15, 2022, 4:05pm

As long as there are compilers that deal robustly with f77 (or older?!) I’m not entirely convinced that these legacy codes will fade away if they are wrapped in a modern interface that is pleasing to interact with. To an end user that’s what matters. You and I may look at the internal implementation with disgust (and I do), but if it works that’s what matters.

Don’t get me wrong, I’m not at all opposed to modernizing legacy codes with certain caveats. I think we’re certainly better programmers now than the dudes in the 80s. But I’m not so sure we’re as good numerical analysts as them. Any effort to produce a modern version of one of those codes is a very major undertaking that should start with a completely comprehensive test suite, as @arjen commented, before any bit of code is modified. Well-used venerable legacy packages have established decades of trust in the user community. If someone comes along with a new modern version of “foopack” (pick your favorite), that trust doesn’t automatically transfer to the modern knock-off; it needs to be established anew starting on the basis of a convincing test suite.

Now if we do have issues with the continuing availability of robust f77 compilers (and that may be the case, I don’t know) then we’ve got a real problem, and we need to get busy producing modern Fortran implementations, starting with recovering old/creating new test suites.

themos · February 15, 2022, 6:42pm

Preach.

For every one person interested in a better answer, there are fifty interested in a faster, riskier answer.

Themos’s exaggeration: “Today, nobody computes a number they haven’t already guessed.”

jacobwilliams · February 16, 2022, 2:59am

Yes, the minpack refactoring is a work in progress. Join us!

Topic		Replies	Views
Why is C used in many optimization libraries? Help	17	914	January 30, 2022
Resistance to modernization	108	5447	September 7, 2022
Scientists are using artificial intelligence and large language models to rewrite old code in modern languages	82	2336	May 23, 2025
NEK for computational fluid dynamics moving to C++?	58	4442	February 8, 2022
Packs -- the classic Fortran libraries	32	3697	February 14, 2022

Poll: refactoring a chunk of legacy code

Related topics