Tips on modernizing old fortran code?

I’m working with two more contributors on modernizing a mixed F90 and F77 codebase. Specifically, we would like to bring the F77 code up to F90 at least. Anyone would like to share their experiences?

How did you deal with those many many GOTOs?

Removing GOTOs is quite intense, and I’m not aware of any automated strategies to do that. I’ve recently refactored fitpack and I’ve had to replace the thousands of GOTOs manually.

I think there are basically two reasons for gotos:

  • loop control statements, these are the trickiest ones because in old codes, loops are often nested so you need to identify whether some chunks of code are better suited for being in a separate function;
  • error control: just go to the end of the routine, where some finalization/error printing is done before returning from the routine.

So, not a free lunch as far as I’m aware, but maybe you could try ask ChatGPT see if it’s capable of doing this refactoring?

2 Likes

Most cases I encountered the using of DO ... GOTO in old code were related to absence the do while loop in FORTRAN 77. Sometimes GOTO is required to be replacement with cycle or exit statement within loop.

The IF ... GOTO and arithmetic IF could be replaced with if ... else statements.

In some exotic cases it could be related to hopping above from external block with skipping some code block within loop if state of system was changed in previous loop step. Then additional flag-variable was required to be introduced that save system state and conditional if statement to skip block if it true.

Anyway it’s better to have tests to ensure that after refactoring the same results are got. Don’t try replace all at once.

1 Like

fpt (http://simconglobal.com) will automate many of the tasks. It will:

There are many changes which may be required in code migration, some of which may be relevant to code modernisation. Please see fpt Reference: MIGRATION

I would recommend that you set up a regression test (if you do not already have one) before making large numbers of automated code changes.

One thing which fpt will not do is to replace GOTO statements with structured code. PlusFort / spag from Polyhedron Software will do that. Many years ago I agreed with John Appleyard at Polyhedron that we would not copy his idea.

5 Likes

This kind of task can also be done by ChatGPT4.

To demonstrate it, I gave this code as input:

         SUBROUTINE RCKELI(N,A,Y,X)
         PARAMETER (IA=50)
         REAL A(IA,IA), Y(IA), X(IA)
         INTEGER N

         X(N)=Y(N)/A(N,N)
         DO 10 K=N-1,1,-1
            S=0.
            DO 20 J=K+1,N
      20       S=S+A(K,J)*X(J)
      10    X(K)=(Y(K)-S)/A(K,K)
         RETURN
         END

After a few (not many) tries, ChatGPT made this:

module back_substitution_module
  implicit none
  integer, parameter :: dp = kind(1.d0)
contains
  subroutine back_substitution(a, y, x)
    integer :: n
    real(dp), intent(in) :: a(:,:), y(:)
    real(dp), intent(out) :: x(size(y))
    real(dp) :: s
    integer :: k

    n = size(y)
    x(n) = y(n) / a(n, n)
  
    do k = n-1, 1, -1
       s = dot_product(a(k, k+1:n), x(k+1:n))
       x(k) = (y(k) - s) / a(k, k)
    end do

  end subroutine back_substitution

end module back_substitution_module

Unfortunately, ChatGPT isn’t as good in Fortran programming compared to Python programming, but giving some hints it’s possible to get good results.

I used ChatGPT to modernize an old FORTRAN routine (horizontal diffusion) at work and AFAIK ChatGPT makes fewer mistakes than a human translating the code manually would make.

Whole Chat

2 Likes

While GOTO statements are a nuisance, especially if they are used in an undisciplined way, old-style code can contain more surprises. For instance the use of COMMON blocks and all manner of tricks to reduce the memory usage. Watch out for those tricks, they can bite you ;).

One other thing to watch out for: inconsistencies between actual calls and the declared argument list.

(I am currently working on a document to describe those old practices. It is all really in its infancy, but if you are interested, see GitHub - arjenmarkus/old-programming-idioms: Explain old-style programming idioms used in FORTRAN 77 programs)

2 Likes

My list of Fortran Tools has sections on Refactoring and Static Analysis. One commercial tool listed there, plusFORT, has an Evaluation Starter Pack. The company of @Jcollins has fpt - Tools for Fortran Engineering. CamFort is a free tool – I don’t know if it can remove gotos.

2 Likes

Here is my perspective (based on trying to refactor a lot of old code) on GOTO’s.

Use of GOTO’s in old code generally falls into three catagories.

  1. conditional branching (now done with if-then-else statements or select case)

Example:

if (a < b) GO TO 10

   some code
   GO TO 20
10 continue
   some more code
20 continue

which would be replaced today with

if (a>=b) then
  some code
else
  some other code
endif
  1. imbedded loops (now done with either DO WHILE or unindexed DO)

Example

10 continue
     if (a > amax) Go To 20
       some code
       a = a + da
       GO TO 10
 20 continue

which would be replaced by

 do while (a<=amax)
  
    some code
    a = a+da
 end do
      
  1. error control (as previously noted) - jumping to the end of a routine to do error handling/printing before exiting. I’ve found that in many cases this is the hardest to remove and is probably not worth the effort. In fact I consider this case the one remaining viable use for GO TOs in modern Fortran programming.

One caveat on trying to replace the GO TO cases that map to if-then-else blocks. If you don’t want to move large blocks of code around, you have to change the conditional clause (say from a.LT.b to a.GE.b) which can lead to some surprising results after the compiler does its optimization thing. You can change the order of execution which can also lead to changes in your results (usually small but still apparent). If your refactored code has to produce output that is bit-wise identical to the old code, you have to be very careful about how you move things around.

I’ve done a fair bit of this, and it takes some trial and error and some discipline. As others have pointed out, the 3 general “patterns” (which is why they were added to languages and gotos fell out of favor) are

  1. conditional branching
  2. looping
  3. error handling

Note that it is not uncommon for these 3 to become entangled in a single procedure. I.e. A loop with conditional cycle and/or exit. Disentangling those can be a chore. Make small moves and test often.

Note that there is a pattern to replace the “error handling” goto. I.e.

...
if (condition) goto 10
...
return
10 continue
! handle error
end

can become

all_good: block
...
if (condition) exit all_good
...
return
end block all_good
! handle error
end

This trick can sometimes be useful in untangling the more complicated cases as well. I.e. jumping out of nested loops like

outer: do ...
  middle: do ...
    inner: do ...
      ...
      if (condition) exit middle
      ...
   end do inner
  end do middle
end do outer

or even nested ifs like

outer: if (...) then
  middle: if (...) then
    ...
    inner: if (...) then
      ...
      if (condition) exit middle
      ...
    end if inner
  else
    ...
  end if middle
  ...
else
  ...
end if outer
1 Like

We have seen another use of GOTO - a computed GOTO to implement a switch. For example:

! ****************************************************************************************
!
!       Switch on the character C, first character of an
!       identifier or operator or white space
!
!       ASCII list
!       0        NUL,SOH,STX,ETX,EOT,ENQ,ACK,BEL,   0-7    0-7
!       1        BS ,TAB,LF ,VTB,FF ,CR ,SO ,SI ,   8-15   8-F
!       2        DLE,DC1,DC2,DC3,DC4,NAK,SYN,ETB,   16-23  10-17
!       3        CAN,EM ,SUB,ESC,FS ,GS ,RS ,US ,   24-31  18-1F
!       4        SPA,!   "   #   $   %   &   '      32-39  20-27
!       5        (   )   *   +   ,   -   .   /      40-47  28-2F
!       6        0   1   2   3   4   5   6   7      48-55  30-37
!       7        8   9   :   ;   <   =   >   ?      56-63  38-3F
!       8        @   A   B   C   D   E   F   G      64-71  40-47
!       9        H   I   J   K   L   M   N   O      72-79  48-4F
!       1        P   Q   R   S   T   U   V   W      80-87  50-57
!       2        X   Y   Z   [   \   ]   ^   _      88-95  58-5F
!       3        `   a   b   c   d   e   f   g      96-103 60-67
!       4        h   i   j   k   l   m   n   o     104-111 68-6F
!       5        p   q   r   s   t   u   v   w     112-119 70-77
!       6        x   y   z   {   |   }   ~   DEL   120-127 78-7F
!
        GOTO (2550,2550,2550,4800,4850,2550,2550,     & ! 0
         2550,2500,1500,1500,1500,1500,2550,2550,     & ! 1
         2550,2550,2550,2550,2550,2550,2550,2550,     & ! 2
         2550,2550,2550,2550,2550,2550,2550,2550,     & ! 3
         1500,2850,4200,4450,4500,4550,4650,3650,     & ! 4
         2900,2950,3000,3050,3100,3150,3200,3250,     & ! 5
         2800,2800,2800,2800,2800,2800,2800,2800,     & ! 6
         2800,2800,3300,4400,3350,3400,3450,4350,     & ! 7
         4700,2750,2750,2750,2750,2750,2750,2750,     & ! 8
         2750,2750,2750,2750,2750,2750,2750,2750,     & ! 9
         2750,2750,2750,2750,2750,2750,2750,2750,     & ! 1
         2750,2750,2750,3500,4600,3550,3650,4750,     & ! 2
         2600,2700,2700,2700,2700,2700,2700,2700,     & ! 3
         2700,2700,2700,2700,2700,2700,2700,2700,     & ! 4
         2700,2700,2700,2700,2700,2700,2700,2700,     & ! 5
         2700,2700,2700,2600,2600,2600,2600,2600)ICHAR(c)
!
!       Fall through switch. non-ASCII character
        CALL fpterr(0907,2,perpre,perrp1,permke,' ')
        GOTO 1500
!
!-----------------------------------------------------------------------------------------
!
!       TAB character
2500    CONTINUE
        i = 1
        DO WHILE ((srccol > tabstp(i)) .AND.                                           &
                  (i < pn_tab_stops))
           i = i+1
        ENDDO
        srccol = tabstp(i)
!       If the last character was also a tab, modify last entry
        IF (fprtyp(nprogr) == ptktab) THEN
           CALL wmprva(nprogr,srccol+1)
           GOTO 1500
        ELSE
           toktyp = ptktab
!          Value is the start column of the NEXT token
           tokval = srccol+1
           GOTO 1450
        ENDIF
!
!-----------------------------------------------------------------------------------------
!
!       Non-printing character
2550    CONTINUE
        GOTO 2650
!
!-----------------------------------------------------------------------------------------
!
!       Illegal character in this context - discarded
2600    CONTINUE
        IF (lexrq == plexi) THEN
!          File name begining with $, # etc.
           GOTO 3850
        ENDIF
2650    CONTINUE
        CALL fpterr(0909,2,perpre,perrpa,permke,' ')
        GOTO 1500
!
!-----------------------------------------------------------------------------------------
!
!       Handler for lower case alphabetic characters.
2700    CONTINUE
        c = CHAR(ICHAR(c)-pcase)
        searca(1) = c
        lcnf = .TRUE.
        stcol = column
        symlen = 1
        toktyp = ptknam
!       Continue with same character in C, raised to upper case.
        GOTO 4900
!       Symbol handler
!
!-----------------------------------------------------------------------------------------
!
!       Handler for upper case alphabetics
2750    CONTINUE
        searca(1) = c
        lcnf = .FALSE.
        stcol = column
        symlen = 1
        toktyp = ptknam
        GOTO 4900
!       Symbol handler
!
!-----------------------------------------------------------------------------------------
!
!       Handler for numeric characters
2800    CONTINUE
!       Don't decode the number if handling Pro*FORTRAN
        IF (lexrq == plex_sql) THEN
!          Just in case tokval contained a significant token
           tokval = 0
!          toktyp can be anything < ptkcnl
           toktyp = ptkint
           GOTO 1450
        ENDIF
        searca(1) = c
        symlen = 1
        toktyp = ptkint
        GOTO 5500
!       Number handler
!
!-----------------------------------------------------------------------------------------
!
!       Handler for '!' character or '/*' - comment delimiter
2850    CONTINUE
!       Attempt to tab to the current printer column
        IF (fprtyp(nprogr) == ptktab) THEN
           CALL wmprva(nprogr,INT(srccol,4))
        ELSE
           CALL weprgm_col(ptktab,INT(srccol,4))
        ENDIF
        toktyp = ptktcm
        GOTO 2150
!
!-----------------------------------------------------------------------------------------
!
!       Handler for '(' character - open parenthesis
2900    CONTINUE
!
!       Parenthesis, not comment
        IF ((recsta(column+1) == '/') .AND.                                            &
            (recsta(column+2) /= '=')) THEN
           tokval = ptoosl
           column = column+1
           srccol = srccol+1
        ELSE
           tokval = ptoopn
        ENDIF
!
        GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
!       Handler for closing parenthesis ')'
2950    CONTINUE
        IF (toklst == ptkopr) THEN
           prgctr = nprogr
           DO WHILE (fprtyp(prgctr) > ptksta)
              IF (fprtyp(prgctr) == ptkspe) THEN
                 CALL bckspl(prgctr)
              ELSE
                 prgctr = prgctr-1
              ENDIF
           ENDDO
           IF (fprval(prgctr) == ptodiv) THEN
              CALL wmprgm(prgctr,ptknul,0)
              tokval = ptoslc
              CALL fpterr(2447,psc_format,perpre,perrpa,permke,' ')
              GOTO 3600
           ENDIF
        ENDIF
!
        tokval = ptoclo
        GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
!       Handler for '*' Multiply or Exponentiate '**'
3000    CONTINUE
        IF (recsta(column+1) == '*') THEN
           tokval = ptoexp
           column = column+1
           srccol = srccol+1
        ELSE
           tokval = ptomul
        ENDIF
!
        GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
!       Handler for '+' - Plus
3050    CONTINUE
        tokval = ptoplu
!
        GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
!       PTOCMA character ','
3100    CONTINUE
        tokval = ptocma
        GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
!       Handler for Minus character '-'
3150    CONTINUE
        tokval = ptomin
!
        GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
!       Handler for point - '.', either an operator or decimal
3200    CONTINUE
        IF ((toklst == ptkopr) .AND.                                                   &
            (recsta(column+1) >= '0') .AND.                                            &
            (recsta(column+1) <= '9')) THEN
!          Leading '.' in a decimal number.
           searca(1) = c
           symlen = 1
           toktyp = ptkrel
           GOTO 5500
!       Number handler
        ELSE
!          Ambiguities must be resolved by SSA
           toktyp = ptkopr
           toklst = ptkopr
           tokval = ptodot
           GOTO 1450
        ENDIF
!
!-----------------------------------------------------------------------------------------

and so it goes on.  Each section ends with a GOTO to the switch collector, or to a piece of shared code executed on the way to it.  Now this could all be done with a CASE statement, and the authors tried that.  There was a very significant performance penalty.  Investigation showed that the compilers then in use implemented the CASE statement as an IF-THEN-ELSE chain, so to get to the higher-valued characters there were a lot of comparisons.  That investigation was a long time ago and I would like to know whether a modern compiler would find a better solution.  I would like to think so.
1 Like

Forgive me - the forum text handler cut off most of the example. The computed GOTO switches on the input character and goes to about 40 destinations to decide what to do with it. It then uses another GOTO to get to the switch collector and go round again. This could, of course, be done with a CASE statement, but the authors tried that and experienced a significant performance penalty. At the time when this was investigated the CASE was internally implemented as an IF-THEN-ELSE chain, so high valued characters were reached after a lot of comparisons. I suspect that modern compilers would do this differently. I would like to hope so.

I don’t have much advice for GOTO statements other than that all GOTO cases I have encountered have been fixable, though the fix comes with much agony and mental torture. You can use named block and other constructs to reduce the suffering in disentangling complex spaghetti mixes of GOTO statements. More importantly, my advice for code modernization is to make all entities in the algorithms generically defined, particularly constants. No explicit dependence on a particular type or kind should exist in the algorithm unless enforced by the nature of the problem. This will tremendously simplify migration to templates once they appear in Fortran, hopefully very soon.

2 Likes

The very first thing I’d suggest is to figure out your test coverage status (tcov, or a preprocessor, whatever you can get solid data from). If you don’t have high coverage, you’ve got pretty high risk going forward. Getting the coverage up should be priority one.

Pin down the original platform details. Often the most spagettified code came from preprocessors (ratfor, efl … many customized versions, or completely alien). Working from the original more structured code is a much easier lift. That can be true even if the maintainers started working directly on the generated code ;< A lot of the later hackery isn’t germane to the core logic…

There are a wide variety of toolsets that have been developed over the years; some pretty solid … but no matter how solid,.large scale changes can perturb behavior. If nothing else, think about What Every Computer Scientist Should Know About Floating-Point Arithmetic … I’ve seen pathological cases where the automated converted code had infinite loops (due to floating point underflow in transformed control loops, and hardware which punted on the underflow to software …).

Virtually all interesting '77 code bases use extensions to the standard, most compilers eventually added a variety of the VAX extensions … but some implementation details may vary from the same vendors '77 to '9x compiler … another subtle family of hazards.

Beliavsky’s list of Fortran tools is a great resource. If there is a lot of “nonstandard extensions” you may need something like Heffner’s xtran but that can be a very big effort to get off the ground.

A pathological example:

Program Main
… stuff…
90000 continue ! top level menu root

call sub1

call subN
end program

subroutine sub$I

if (condition) goto 90000

end subroutine

There was only one 90000 in the entire program, some Prime Fortran environments accepted this, using PL/1 non-local goto rules (Sun f77 got modified to provide setjmp/longjmp to accommodate this code). AFAIK, it wasn’t a documented/intentional feature of Prime Fortran (but back in the day, a former Prime employee confirmed that it would have worked as a side effect of their implementation strategy) … and at least one major customer had a huge code base that depended on it.

2 Likes

Very interesting, thanks! Also interesting to know you develop that software.

@everythingfunctional @rwmsu @Carltoffel ChatGPT is actually a great idea too, I will definitely try it.
Also knowing the vast majority of gotos falls withing three cases is very good.

@Beliavsky That’s a very interesting list, a very useful resource. Reminds of awesome-language.

As for everyone who mentioned unit tests, yes, we will definitely work on that before refactoring. Fortunately this code base does not use any extensions (that I’ve seen) and no pre processor. It’s also very well documented for the time with papers in pdf and call stacks in function comments.

Sorry for not mentioning this above: ChatGPT 3.5 didn’t get the if right (didn’t test the others) because it confused which branch was for true/false. With ChatGPT 4 it worked fine.

Hi
SPAG, which is part of our plusFORT toolkit does this. In fact, I’ve noticed that it’s over 35 years since the first release of SPAG (the spaghetti unscrambler!), and it’s evolved a lot since then. For example, it also rewrites declarations in modern Fortran, and allows switching from implicit to explicit typing. plusFORT also includes global static and dynamic and coverage analysis, and uses cluster analysis to help define a module structure for refactored code. SPAG accepts all of Fortran 66/77/90 and 95 and, with a few documented exceptions, Fortran 2003.
In addition to the “starter pack” mentioned by Beliavsky, we can offer a full licence FOC on request for projects which involve only open-source code. You can contact us via https://fortran.uk.

John Appleyard

4 Likes

Refactoring goto’s is a challenge when modernizing fortran. I am a big fan of plusFORT and SPAG. And I do this type of fortran modernization quite often in my consulting business. As a further step, I also usually end up converting old codes to other languages, usually matlab or python or R.

But because my goal is to eventually convert the Fortran code into another language, most of which do not have a goto statement, I have to get rid of every single goto, including arithmetic if’s and assigned/computed goto’s and goto’s embedded in ‘open’ (err=label) or ‘read’ (end=label) statements, etc.

In my experience, if I have say 10,000 goto’s in my code base, Spag will refactor about 30 to 50% of those goto’s. And that is a huge benefit. When spag refactors these goto’s, it does it cleanly and often reduces clutter at the same time. But what am I supposed to do with the remaining 6000 goto’s? I had to develop another tool (based on my experience refactoring slatec manually), which I call ‘remgoto’, which gets rid of all of the remaining goto’s. It works well for every goto (remaining after SPAG) I’ve ever encountered thus far going through millions of lines of code. Now I should say that even though it is 100% successful, is not 100% beautiful. The goto’s that spag misses usually can only be re-factored using ‘while’, ‘exit’, and ‘cycle’. Remgoto has had in the past produced 50+level nested while/if/exit/cycle constructions in order to refactor some spaghetti code.

I haven’t releases remgoto publicly, but I could take a look at your code and potentially help your project. Look up ‘Barrowes Consulting’.

Ben Barrowes

1 Like

on top of everything else called out above, I would do something for interface checking. As a first step to pulling procedures into modules. I think most compilers have an option to do interface checking for old code EXTERNAL procedures. Like the ifort/ifx
-warn interfaces
option.
You could consider generating interfaces with the -gen-interfaces option, which creates procedure-name__GENmod.f90 and .mod files for your procedures. These could be included in the code to help with interface checking. But honestly, you really want to start pulling procedures and data into modules. That is the end-goal.

Probably more pressing is to get rid of COMMON, and, god forbid, EQUIVALENCE statements (if any).

So many things needed to bring F77 up to modern standards.

2 Likes