Old Fortran codes did not always conform to the Fortran standard in force at their creation, and such deviations did not bother the users of the codes as long as the codes ran correctly on the machines in use at the time. During the last few days, I tangled with one such bug, and I felt that it would be useful to record some highlights of the experience.
The code in question is Hairer’s stiff ODE solver RADAU5. Ernst Hairer is a distinguished mathematician, and is the author of a superb 2-volume treatise on differential equations, the second volume of which is devoted to stiff ODEs. Ernst Hairer’s son, Martin Hairer, won the 2014 Fields Medal.
The three source files needed are dr1_radau5.f, radau5.f and dc_lapack.f, which you can download from Hairer’s site.
Compiling the three files with Lapack and BLAS libraries and running the resulting program produces the following output (a few digits may differ, depending on which compiler you use, etc.)
X = 0.00 Y = 0.2000000000E+01 -0.6600000000E+00 NSTEP = 0
X = 0.20 Y = 0.1858198964E+01 -0.7574791034E+00 NSTEP = 10
X = 0.40 Y = 0.1693205231E+01 -0.9069021617E+00 NSTEP = 11
X = 0.60 Y = 0.1484565763E+01 -0.1233096965E+01 NSTEP = 13
X = 0.80 Y = 0.1083912895E+01 -0.6196077501E+01 NSTEP = 22
X = 1.00 Y = -0.1863645642E+01 0.7535323767E+00 NSTEP = 123
X = 1.20 Y = -0.1699724325E+01 0.8997692747E+00 NSTEP = 124
X = 1.40 Y = -0.1493375073E+01 0.1213880891E+01 NSTEP = 126
X = 1.60 Y = -0.1120780441E+01 0.4374794949E+01 NSTEP = 133
X = 1.80 Y = 0.1869050586E+01 -0.7495753072E+00 NSTEP = 237
X = 2.00 Y = 0.1706161101E+01 -0.8928074777E+00 NSTEP = 238
X = 2.00 Y = 0.1706161101E+01 -0.8928074777E+00
rtol=0.10D-03
fcn= 2218 jac= 161 step= 275 accpt= 238 rejct= 8 dec= 248 sol= 660
Everything seems in order. The test problem (van der Pol equation) involves only two variables, and the program runs in less than a second.
However, I wish to use this ODE solver on a stiff ODE with 400 variables (Medakzo). Therefore, I wanted to identify the part of the code where most of the run time is spent, and improve the code by using modern Fortran or by calling fast MKL routines as replacements for expensive DO loops.
Profiling the code (still with the van der Pol problem) showed that the first and second most used routines were FVPOL, which implements the specific ODE right hand side functions, and SLVRAD, which forms and solves a dense set of linear equations using Lapack. There is not much to do to speed up FVPOL, and the number of times it gets called is controlled by the ODE solver algorithm. That took me into the source for SLVRAD, which is where my troubles began.
The section of code that forms the arguments to pass to the Lapack linear solver pair DGETRF/DGETRS is the following:
DO I=1,N
S2=-F2(I)
S3=-F3(I)
Z1(I)=Z1(I)-F1(I)*FAC1
Z2(I)=Z2(I)+S2*ALPHN-S3*BETAN
CONT(I)=Z3(I)+S3*ALPHN+S2*BETAN
END DO
DO I=N,1,-1
Z2(2*I-1)=Z2(I)
Z2(2*I)=CONT(I)
END DO
All the arrays that you see here are dummy arguments. I felt that I could combine the two loops and eliminate the temporary variables, using this equivalent code:
do i = n, 1, -1
z1(i) = z1(i) - f1(i) * fac1
z2(2*i) = z3(i) - f3(i) * alphn - f2(i) * betan
z2(2*i-1) = z2(i) - f2(i) * alphn + f3(i) * betan
end do
The program output changed to
X = 0.00 Y = 0.2000000000E+01 -0.6600000000E+00 NSTEP = 0
X = 0.20 Y = 0.1899165797E+01 -0.7285341796E+00 NSTEP =1009
X = 0.40 Y = 0.1787641272E+01 -0.8141696660E+00 NSTEP =2635
X = 0.60 Y = 0.1660762839E+01 -0.9446167743E+00 NSTEP =5042
X = 0.80 Y = 0.1556660519E+01 -0.1093780715E+01 NSTEP =****
EXIT OF RADAU5 AT X= 0.9004E+00
MORE THAN NMAX = 100000 STEPS ARE NEEDED
X = 0.90 Y = 0.1556659825E+01 -0.1093786486E+01
rtol=0.10D-03
fcn=***** jac=**** step=**** accpt=**** rejct=*** dec=**** sol=*****
What happened? Oh, the arrays Z2 and Z3 are of formal size N, but the code treats Z2 as if it were of size 2*N. Yet, the original code ran fine on a number of test problems, whereas my “improved” version fails!
Unable to see why the seemingly correct replacement code caused the program to fail, I resorted to my favorite debugging tool: the PRINT statement. I added this before my replacement fused loop:
write(*,'(A,2ES12.4)')'Before loop, Z3 = ',z3(1:n)
and this after the same loop:
write(*,'(A,2ES12.4)')'After loop, Z3 = ',z3(1:n)
stop
The output after making these changes became:
X = 0.00 Y = 0.2000000000E+01 -0.6600000000E+00 NSTEP = 0
Before loop, Z3 = -9.7219E-01 -2.9460E+04
After loop, Z3 = 8.0596E+04 -2.9460E+04
How can this be? Z3 appears only to the right of the = sign in the replacement loop. How did it get changed? Was it because of array bounds being crossed, or aliasing, or both?
[To be Continued; please post your comments and suggestion if the topic interests you. Thanks in advance!]