I have been working with the SLATEC library. Following the closure of Absoft and Lahey during the last few months, I felt that I needed to preserve their versions of the library, which they included as free add-ons that their customers could choose to use. Lahey provided nice HTML formatted documentation of the SLATEC routines; these are still up on their web site, but for how long? Some users of Silverfrost FTN95 expressed interest in SLATEC, and I put some effort into getting the library to work with FTN95.
SLATEC is a large library, old but high quality, and huge (about 1900 subroutines), Any calculation, such as evaluating an integral or solving a least squares problem requires as many as 90 of those subroutines. The number of bugs divided by the lines of code is very low, but the number of bugs in the whole library is not negligible. Given these features, it is to be noted that finding and fixing those bugs is no easy task. Doing the same with a modern version such as Mehdi Chinoun’s ( GitHub - MehdiChinoune/SLATEC ) fine contribution is nearly impossible, as I found after trying. I had to accept that I should work with the Netlib Fortran 77 version, at least for the bug hunting work.
Some of the subroutines still contain ASSIGNed GOTOs, and there are a number of EQUIVALENCE and COMMON declarations, which modern compilers will complain about. Polishing the code, and doing so without introducing new errors, is difficult with such a large codebase.
Here is a sample of the kind of problems that I ran into. One of the subroutines is the Linpack routine CGEDI. A portion of the code in the CGEDI included in SLATEC:
C***FIRST EXECUTABLE STATEMENT CGEDI
C
C COMPUTE DETERMINANT
C
IF (JOB/10 .EQ. 0) GO TO 70
DET(1) = (1.0E0,0.0E0)
DET(2) = (0.0E0,0.0E0)
TEN = 10.0E0
DO 50 I = 1, N
IF (IPVT(I) .NE. I) DET(1) = -DET(1)
DET(1) = A(I,I)*DET(1)
IF (CABS1(DET(1)) .EQ. 0.0E0) GO TO 60
10 IF (CABS1(DET(1)) .GE. 1.0E0) GO TO 20
DET(1) = CMPLX(TEN,0.0E0)*DET(1)
DET(2) = DET(2) - (1.0E0,0.0E0)
GO TO 10
20 CONTINUE
30 IF (CABS1(DET(1)) .LT. TEN) GO TO 40
DET(1) = DET(1)/CMPLX(TEN,0.0E0)
DET(2) = DET(2) + (1.0E0,0.0E0)
GO TO 30
40 CONTINUE
50 CONTINUE
60 CONTINUE
70 CONTINUE
I found that there appears to be a newer (appears; the dates indicate otherwise!) version at Netlib, www.netlib.org/linpack/cgedi.f , which is a bit easier on the eye:
c
c compute determinant
c
if (job/10 .eq. 0) go to 70
det(1) = (1.0e0,0.0e0)
det(2) = (0.0e0,0.0e0)
ten = 10.0e0
do 50 i = 1, n
if (ipvt(i) .ne. i) det(1) = -det(1)
det(1) = a(i,i)*det(1)
c ...exit
if (cabs1(det(1)) .eq. 0.0e0) go to 60
10 if (cabs1(det(1)) .ge. 1.0e0) go to 20
det(1) = cmplx(ten,0.0e0)*det(1)
det(2) = det(2) - (1.0e0,0.0e0)
go to 10
20 continue
30 if (cabs1(det(1)) .lt. ten) go to 40
det(1) = det(1)/cmplx(ten,0.0e0)
det(2) = det(2) + (1.0e0,0.0e0)
go to 30
40 continue
50 continue
60 continue
70 continue
Still too many statement labels and GO TOs. I detected a DO WHILE loop between statements 30 and 40, and substituted the following, after checking to make sure that the statement labels in these lines were not referred to elsewhere:
do while (cabs1(det(1)) .ge. ten)
det(1) = det(1) * 0.1e0
det(2) = det(2) + 1.0e0
end do
This change (and similar changes to other portions of SLATEC) gave me code that was easier to work with. I found a number of bugs and fixed most of them. I felt pleased with the result, and started testing the corrected code with a number of compilers. All the 54 tests of SLATEC passed with a number of compilers (Intel, Lahey, Gfortran, Absoft, Silverfrost, CVF, all on Windows).
Today, I chanced to run tests using the old Microsoft FPS 4.0 compiler. Test 23 failed, and in a bad way – infinite loop in the block of code that I showed above, and it took a good bit of effort to locate the bug.
What the FPS 4.0 compiler did (with or without optimization requested) is to move the evaluation of the test clause to just before entering the loop.
FPS 4.0 is obsolete and has a reputation for being buggy, but my experience is that it is not as bad as it is made out to be. It has often generated correct code where newer compilers had failed. It provides one more hoop to make the test code jump through.
Some lessons that I learned from the exercise:
-
Polishing code can make compiler bugs surface, and introduce new bugs into the polished code regardless of which compiler is used
-
Putting common blocks, subroutines and function bodies into Fortran modules is definitely worthwhile, but not without risks. In a large body of code, doing so can make debugging impossible because of the complexity of module dependencies. (Watch out for mis-converted common blocks with different member names in different places, and agglomerations of place-holder variables! Code polishing tools fail spectacularly with these.)
-
There are many instances of scalars passed to array dummy arguments, work array sections of mismatched types used as actual arguments, etc. At the time that the original SLATEC codes were written, there probably was no way to avoid such usage, and it is not clear how to repair such code. If we ask the compiler to overlook such errors in some places, we have to live with not being able to check related errors in other places.
-
Old code has bugs, but so does new code. Compilers have bugs. Library code may have bugs. The programmer’s mind may have bugs (speaking about yours truly here).
-
When a program fails, is it because of bugs in the code, in the compiler, or in a library? We may fix code or resort to work-arounds if we have the source code and the necessary knowledge and ability. Compiler bugs will have to be reported, but may take months or years to fix through a support service.
-
If it ain’t broke, try harder!
-
Familiarity breeds respect.
In summary, I suggest that some modernization tasks should be performed in two stages. The first stage would focus on making the code easier to work with, replace spaghetti code with structured code, declare all variables and get rid of troublesome features such as common blocks. After this cleaned/converted code passes testing by a number of users and is more or less bug free, we may move on to the next stage of modernization, using module procedures, derived types, etc. None of my comments apply to code that is being written afresh now.
I look forward to your reactions. Thanks for reading this long post.