I’m working with two more contributors on modernizing a mixed F90 and F77 codebase. Specifically, we would like to bring the F77 code up to F90 at least. Anyone would like to share their experiences?
How did you deal with those many many GOTOs?
I’m working with two more contributors on modernizing a mixed F90 and F77 codebase. Specifically, we would like to bring the F77 code up to F90 at least. Anyone would like to share their experiences?
How did you deal with those many many GOTOs?
Removing GOTOs is quite intense, and I’m not aware of any automated strategies to do that. I’ve recently refactored fitpack and I’ve had to replace the thousands of GOTOs
manually.
I think there are basically two reasons for gotos:
So, not a free lunch as far as I’m aware, but maybe you could try ask ChatGPT see if it’s capable of doing this refactoring?
Most cases I encountered the using of DO ... GOTO
in old code were related to absence the do while
loop in FORTRAN 77. Sometimes GOTO
is required to be replacement with cycle
or exit
statement within loop.
The IF ... GOTO
and arithmetic IF could be replaced with if ... else
statements.
In some exotic cases it could be related to hopping above from external block with skipping some code block within loop if state of system was changed in previous loop step. Then additional flag-variable was required to be introduced that save system state and conditional if
statement to skip block if it true.
Anyway it’s better to have tests to ensure that after refactoring the same results are got. Don’t try replace all at once.
fpt (http://simconglobal.com) will automate many of the tasks. It will:
Reformat the code to free format. You can specify the indentation style, the upper and lower case of keywords, symbols, intrinsics etc. and many other things. See Formatting Fortran Code
Move labels from executable statements to CONTINUE statements (see http://www.simconglobal.com/fpt_ref_remove_labels_from_executable_statements.html)
Change DO - CONTINUE to DO - ENDDO (see fpt Reference: CHANGE DO - CONTINUE TO DO - ENDDO)
Specify IMPLICIT NONE and insert declarations for all undeclared symbols (see http://www.simconglobal.com/fpt_ref_specify_implicit_none.html)
There are many changes which may be required in code migration, some of which may be relevant to code modernisation. Please see fpt Reference: MIGRATION
I would recommend that you set up a regression test (if you do not already have one) before making large numbers of automated code changes.
One thing which fpt will not do is to replace GOTO statements with structured code. PlusFort / spag from Polyhedron Software will do that. Many years ago I agreed with John Appleyard at Polyhedron that we would not copy his idea.
This kind of task can also be done by ChatGPT4.
To demonstrate it, I gave this code as input:
SUBROUTINE RCKELI(N,A,Y,X)
PARAMETER (IA=50)
REAL A(IA,IA), Y(IA), X(IA)
INTEGER N
X(N)=Y(N)/A(N,N)
DO 10 K=N-1,1,-1
S=0.
DO 20 J=K+1,N
20 S=S+A(K,J)*X(J)
10 X(K)=(Y(K)-S)/A(K,K)
RETURN
END
After a few (not many) tries, ChatGPT made this:
module back_substitution_module
implicit none
integer, parameter :: dp = kind(1.d0)
contains
subroutine back_substitution(a, y, x)
integer :: n
real(dp), intent(in) :: a(:,:), y(:)
real(dp), intent(out) :: x(size(y))
real(dp) :: s
integer :: k
n = size(y)
x(n) = y(n) / a(n, n)
do k = n-1, 1, -1
s = dot_product(a(k, k+1:n), x(k+1:n))
x(k) = (y(k) - s) / a(k, k)
end do
end subroutine back_substitution
end module back_substitution_module
Unfortunately, ChatGPT isn’t as good in Fortran programming compared to Python programming, but giving some hints it’s possible to get good results.
I used ChatGPT to modernize an old FORTRAN routine (horizontal diffusion) at work and AFAIK ChatGPT makes fewer mistakes than a human translating the code manually would make.
While GOTO statements are a nuisance, especially if they are used in an undisciplined way, old-style code can contain more surprises. For instance the use of COMMON blocks and all manner of tricks to reduce the memory usage. Watch out for those tricks, they can bite you ;).
One other thing to watch out for: inconsistencies between actual calls and the declared argument list.
(I am currently working on a document to describe those old practices. It is all really in its infancy, but if you are interested, see GitHub - arjenmarkus/old-programming-idioms: Explain old-style programming idioms used in FORTRAN 77 programs)
My list of Fortran Tools has sections on Refactoring and Static Analysis. One commercial tool listed there, plusFORT, has an Evaluation Starter Pack. The company of @Jcollins has fpt - Tools for Fortran Engineering. CamFort is a free tool – I don’t know if it can remove gotos.
Here is my perspective (based on trying to refactor a lot of old code) on GOTO’s.
Use of GOTO’s in old code generally falls into three catagories.
Example:
if (a < b) GO TO 10
some code
GO TO 20
10 continue
some more code
20 continue
which would be replaced today with
if (a>=b) then
some code
else
some other code
endif
Example
10 continue
if (a > amax) Go To 20
some code
a = a + da
GO TO 10
20 continue
which would be replaced by
do while (a<=amax)
some code
a = a+da
end do
One caveat on trying to replace the GO TO cases that map to if-then-else blocks. If you don’t want to move large blocks of code around, you have to change the conditional clause (say from a.LT.b to a.GE.b) which can lead to some surprising results after the compiler does its optimization thing. You can change the order of execution which can also lead to changes in your results (usually small but still apparent). If your refactored code has to produce output that is bit-wise identical to the old code, you have to be very careful about how you move things around.
I’ve done a fair bit of this, and it takes some trial and error and some discipline. As others have pointed out, the 3 general “patterns” (which is why they were added to languages and goto
s fell out of favor) are
Note that it is not uncommon for these 3 to become entangled in a single procedure. I.e. A loop with conditional cycle and/or exit. Disentangling those can be a chore. Make small moves and test often.
Note that there is a pattern to replace the “error handling” goto
. I.e.
...
if (condition) goto 10
...
return
10 continue
! handle error
end
can become
all_good: block
...
if (condition) exit all_good
...
return
end block all_good
! handle error
end
This trick can sometimes be useful in untangling the more complicated cases as well. I.e. jumping out of nested loops like
outer: do ...
middle: do ...
inner: do ...
...
if (condition) exit middle
...
end do inner
end do middle
end do outer
or even nested if
s like
outer: if (...) then
middle: if (...) then
...
inner: if (...) then
...
if (condition) exit middle
...
end if inner
else
...
end if middle
...
else
...
end if outer
We have seen another use of GOTO - a computed GOTO to implement a switch. For example:
! ****************************************************************************************
!
! Switch on the character C, first character of an
! identifier or operator or white space
!
! ASCII list
! 0 NUL,SOH,STX,ETX,EOT,ENQ,ACK,BEL, 0-7 0-7
! 1 BS ,TAB,LF ,VTB,FF ,CR ,SO ,SI , 8-15 8-F
! 2 DLE,DC1,DC2,DC3,DC4,NAK,SYN,ETB, 16-23 10-17
! 3 CAN,EM ,SUB,ESC,FS ,GS ,RS ,US , 24-31 18-1F
! 4 SPA,! " # $ % & ' 32-39 20-27
! 5 ( ) * + , - . / 40-47 28-2F
! 6 0 1 2 3 4 5 6 7 48-55 30-37
! 7 8 9 : ; < = > ? 56-63 38-3F
! 8 @ A B C D E F G 64-71 40-47
! 9 H I J K L M N O 72-79 48-4F
! 1 P Q R S T U V W 80-87 50-57
! 2 X Y Z [ \ ] ^ _ 88-95 58-5F
! 3 ` a b c d e f g 96-103 60-67
! 4 h i j k l m n o 104-111 68-6F
! 5 p q r s t u v w 112-119 70-77
! 6 x y z { | } ~ DEL 120-127 78-7F
!
GOTO (2550,2550,2550,4800,4850,2550,2550, & ! 0
2550,2500,1500,1500,1500,1500,2550,2550, & ! 1
2550,2550,2550,2550,2550,2550,2550,2550, & ! 2
2550,2550,2550,2550,2550,2550,2550,2550, & ! 3
1500,2850,4200,4450,4500,4550,4650,3650, & ! 4
2900,2950,3000,3050,3100,3150,3200,3250, & ! 5
2800,2800,2800,2800,2800,2800,2800,2800, & ! 6
2800,2800,3300,4400,3350,3400,3450,4350, & ! 7
4700,2750,2750,2750,2750,2750,2750,2750, & ! 8
2750,2750,2750,2750,2750,2750,2750,2750, & ! 9
2750,2750,2750,2750,2750,2750,2750,2750, & ! 1
2750,2750,2750,3500,4600,3550,3650,4750, & ! 2
2600,2700,2700,2700,2700,2700,2700,2700, & ! 3
2700,2700,2700,2700,2700,2700,2700,2700, & ! 4
2700,2700,2700,2700,2700,2700,2700,2700, & ! 5
2700,2700,2700,2600,2600,2600,2600,2600)ICHAR(c)
!
! Fall through switch. non-ASCII character
CALL fpterr(0907,2,perpre,perrp1,permke,' ')
GOTO 1500
!
!-----------------------------------------------------------------------------------------
!
! TAB character
2500 CONTINUE
i = 1
DO WHILE ((srccol > tabstp(i)) .AND. &
(i < pn_tab_stops))
i = i+1
ENDDO
srccol = tabstp(i)
! If the last character was also a tab, modify last entry
IF (fprtyp(nprogr) == ptktab) THEN
CALL wmprva(nprogr,srccol+1)
GOTO 1500
ELSE
toktyp = ptktab
! Value is the start column of the NEXT token
tokval = srccol+1
GOTO 1450
ENDIF
!
!-----------------------------------------------------------------------------------------
!
! Non-printing character
2550 CONTINUE
GOTO 2650
!
!-----------------------------------------------------------------------------------------
!
! Illegal character in this context - discarded
2600 CONTINUE
IF (lexrq == plexi) THEN
! File name begining with $, # etc.
GOTO 3850
ENDIF
2650 CONTINUE
CALL fpterr(0909,2,perpre,perrpa,permke,' ')
GOTO 1500
!
!-----------------------------------------------------------------------------------------
!
! Handler for lower case alphabetic characters.
2700 CONTINUE
c = CHAR(ICHAR(c)-pcase)
searca(1) = c
lcnf = .TRUE.
stcol = column
symlen = 1
toktyp = ptknam
! Continue with same character in C, raised to upper case.
GOTO 4900
! Symbol handler
!
!-----------------------------------------------------------------------------------------
!
! Handler for upper case alphabetics
2750 CONTINUE
searca(1) = c
lcnf = .FALSE.
stcol = column
symlen = 1
toktyp = ptknam
GOTO 4900
! Symbol handler
!
!-----------------------------------------------------------------------------------------
!
! Handler for numeric characters
2800 CONTINUE
! Don't decode the number if handling Pro*FORTRAN
IF (lexrq == plex_sql) THEN
! Just in case tokval contained a significant token
tokval = 0
! toktyp can be anything < ptkcnl
toktyp = ptkint
GOTO 1450
ENDIF
searca(1) = c
symlen = 1
toktyp = ptkint
GOTO 5500
! Number handler
!
!-----------------------------------------------------------------------------------------
!
! Handler for '!' character or '/*' - comment delimiter
2850 CONTINUE
! Attempt to tab to the current printer column
IF (fprtyp(nprogr) == ptktab) THEN
CALL wmprva(nprogr,INT(srccol,4))
ELSE
CALL weprgm_col(ptktab,INT(srccol,4))
ENDIF
toktyp = ptktcm
GOTO 2150
!
!-----------------------------------------------------------------------------------------
!
! Handler for '(' character - open parenthesis
2900 CONTINUE
!
! Parenthesis, not comment
IF ((recsta(column+1) == '/') .AND. &
(recsta(column+2) /= '=')) THEN
tokval = ptoosl
column = column+1
srccol = srccol+1
ELSE
tokval = ptoopn
ENDIF
!
GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
! Handler for closing parenthesis ')'
2950 CONTINUE
IF (toklst == ptkopr) THEN
prgctr = nprogr
DO WHILE (fprtyp(prgctr) > ptksta)
IF (fprtyp(prgctr) == ptkspe) THEN
CALL bckspl(prgctr)
ELSE
prgctr = prgctr-1
ENDIF
ENDDO
IF (fprval(prgctr) == ptodiv) THEN
CALL wmprgm(prgctr,ptknul,0)
tokval = ptoslc
CALL fpterr(2447,psc_format,perpre,perrpa,permke,' ')
GOTO 3600
ENDIF
ENDIF
!
tokval = ptoclo
GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
! Handler for '*' Multiply or Exponentiate '**'
3000 CONTINUE
IF (recsta(column+1) == '*') THEN
tokval = ptoexp
column = column+1
srccol = srccol+1
ELSE
tokval = ptomul
ENDIF
!
GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
! Handler for '+' - Plus
3050 CONTINUE
tokval = ptoplu
!
GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
! PTOCMA character ','
3100 CONTINUE
tokval = ptocma
GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
! Handler for Minus character '-'
3150 CONTINUE
tokval = ptomin
!
GOTO 3600
!
!-----------------------------------------------------------------------------------------
!
! Handler for point - '.', either an operator or decimal
3200 CONTINUE
IF ((toklst == ptkopr) .AND. &
(recsta(column+1) >= '0') .AND. &
(recsta(column+1) <= '9')) THEN
! Leading '.' in a decimal number.
searca(1) = c
symlen = 1
toktyp = ptkrel
GOTO 5500
! Number handler
ELSE
! Ambiguities must be resolved by SSA
toktyp = ptkopr
toklst = ptkopr
tokval = ptodot
GOTO 1450
ENDIF
!
!-----------------------------------------------------------------------------------------
and so it goes on. Each section ends with a GOTO to the switch collector, or to a piece of shared code executed on the way to it. Now this could all be done with a CASE statement, and the authors tried that. There was a very significant performance penalty. Investigation showed that the compilers then in use implemented the CASE statement as an IF-THEN-ELSE chain, so to get to the higher-valued characters there were a lot of comparisons. That investigation was a long time ago and I would like to know whether a modern compiler would find a better solution. I would like to think so.
Forgive me - the forum text handler cut off most of the example. The computed GOTO switches on the input character and goes to about 40 destinations to decide what to do with it. It then uses another GOTO to get to the switch collector and go round again. This could, of course, be done with a CASE statement, but the authors tried that and experienced a significant performance penalty. At the time when this was investigated the CASE was internally implemented as an IF-THEN-ELSE chain, so high valued characters were reached after a lot of comparisons. I suspect that modern compilers would do this differently. I would like to hope so.
I don’t have much advice for GOTO statements other than that all GOTO cases I have encountered have been fixable, though the fix comes with much agony and mental torture. You can use named block
and other constructs to reduce the suffering in disentangling complex spaghetti mixes of GOTO statements. More importantly, my advice for code modernization is to make all entities in the algorithms generically defined, particularly constants. No explicit dependence on a particular type or kind should exist in the algorithm unless enforced by the nature of the problem. This will tremendously simplify migration to templates once they appear in Fortran, hopefully very soon.
The very first thing I’d suggest is to figure out your test coverage status (tcov, or a preprocessor, whatever you can get solid data from). If you don’t have high coverage, you’ve got pretty high risk going forward. Getting the coverage up should be priority one.
Pin down the original platform details. Often the most spagettified code came from preprocessors (ratfor, efl … many customized versions, or completely alien). Working from the original more structured code is a much easier lift. That can be true even if the maintainers started working directly on the generated code ;< A lot of the later hackery isn’t germane to the core logic…
There are a wide variety of toolsets that have been developed over the years; some pretty solid … but no matter how solid,.large scale changes can perturb behavior. If nothing else, think about What Every Computer Scientist Should Know About Floating-Point Arithmetic … I’ve seen pathological cases where the automated converted code had infinite loops (due to floating point underflow in transformed control loops, and hardware which punted on the underflow to software …).
Virtually all interesting '77 code bases use extensions to the standard, most compilers eventually added a variety of the VAX extensions … but some implementation details may vary from the same vendors '77 to '9x compiler … another subtle family of hazards.
Beliavsky’s list of Fortran tools is a great resource. If there is a lot of “nonstandard extensions” you may need something like Heffner’s xtran but that can be a very big effort to get off the ground.
A pathological example:
Program Main
… stuff…
90000 continue ! top level menu root
…
call sub1
…
call subN
end program
subroutine sub$I
…
if (condition) goto 90000
…
end subroutine
There was only one 90000 in the entire program, some Prime Fortran environments accepted this, using PL/1 non-local goto rules (Sun f77 got modified to provide setjmp/longjmp to accommodate this code). AFAIK, it wasn’t a documented/intentional feature of Prime Fortran (but back in the day, a former Prime employee confirmed that it would have worked as a side effect of their implementation strategy) … and at least one major customer had a huge code base that depended on it.
Very interesting, thanks! Also interesting to know you develop that software.
@everythingfunctional @rwmsu @Carltoffel ChatGPT is actually a great idea too, I will definitely try it.
Also knowing the vast majority of gotos falls withing three cases is very good.
@Beliavsky That’s a very interesting list, a very useful resource. Reminds of awesome-language.
As for everyone who mentioned unit tests, yes, we will definitely work on that before refactoring. Fortunately this code base does not use any extensions (that I’ve seen) and no pre processor. It’s also very well documented for the time with papers in pdf and call stacks in function comments.
Sorry for not mentioning this above: ChatGPT 3.5 didn’t get the if
right (didn’t test the others) because it confused which branch was for true/false. With ChatGPT 4 it worked fine.
Hi
SPAG, which is part of our plusFORT toolkit does this. In fact, I’ve noticed that it’s over 35 years since the first release of SPAG (the spaghetti unscrambler!), and it’s evolved a lot since then. For example, it also rewrites declarations in modern Fortran, and allows switching from implicit to explicit typing. plusFORT also includes global static and dynamic and coverage analysis, and uses cluster analysis to help define a module structure for refactored code. SPAG accepts all of Fortran 66/77/90 and 95 and, with a few documented exceptions, Fortran 2003.
In addition to the “starter pack” mentioned by Beliavsky, we can offer a full licence FOC on request for projects which involve only open-source code. You can contact us via https://fortran.uk.
John Appleyard
Refactoring goto’s is a challenge when modernizing fortran. I am a big fan of plusFORT and SPAG. And I do this type of fortran modernization quite often in my consulting business. As a further step, I also usually end up converting old codes to other languages, usually matlab or python or R.
But because my goal is to eventually convert the Fortran code into another language, most of which do not have a goto statement, I have to get rid of every single goto, including arithmetic if’s and assigned/computed goto’s and goto’s embedded in ‘open’ (err=label) or ‘read’ (end=label) statements, etc.
In my experience, if I have say 10,000 goto’s in my code base, Spag will refactor about 30 to 50% of those goto’s. And that is a huge benefit. When spag refactors these goto’s, it does it cleanly and often reduces clutter at the same time. But what am I supposed to do with the remaining 6000 goto’s? I had to develop another tool (based on my experience refactoring slatec manually), which I call ‘remgoto’, which gets rid of all of the remaining goto’s. It works well for every goto (remaining after SPAG) I’ve ever encountered thus far going through millions of lines of code. Now I should say that even though it is 100% successful, is not 100% beautiful. The goto’s that spag misses usually can only be re-factored using ‘while’, ‘exit’, and ‘cycle’. Remgoto has had in the past produced 50+level nested while/if/exit/cycle constructions in order to refactor some spaghetti code.
I haven’t releases remgoto publicly, but I could take a look at your code and potentially help your project. Look up ‘Barrowes Consulting’.
Ben Barrowes
on top of everything else called out above, I would do something for interface checking. As a first step to pulling procedures into modules. I think most compilers have an option to do interface checking for old code EXTERNAL procedures. Like the ifort/ifx
-warn interfaces
option.
You could consider generating interfaces with the -gen-interfaces option, which creates procedure-name__GENmod.f90 and .mod files for your procedures. These could be included in the code to help with interface checking. But honestly, you really want to start pulling procedures and data into modules. That is the end-goal.
Probably more pressing is to get rid of COMMON, and, god forbid, EQUIVALENCE statements (if any).
So many things needed to bring F77 up to modern standards.