Hej,
I stumbled on the fortran-lang/benchmarks repo earlier this week which seems to take the dust. There is a long and fascinating discussion here about whether we should actually benchmark Fortran
against other languages, what benchmarking even means in this context, etc. While I don’t want to re-open the heated discussion, I still believe that this could be an awesome pedagogical resources. To some extent, it could serve the same purpose as the Scientific Python Lectures: carefully crafted tutorials for standard scientific computing in Fortran
(with and without stdlib
) where the influence of some of the key constructs (e.g. nested do
loops, do concurrent
, array syntax, etc) on the actual implementation of a given algorithm as well its computational performances could be discussed along with some guidelines about options for standard compilers (think gfortran
and ifort
). Given this point of view, instead of renaming the repo to idioms
as suggested here, we could actually rename it Scientific Fortran Lectures or something along these lines. This wouldn’t prevent us from including implementations in other languages (e.g. c
, Julia
, Python
, etc) for the sake of comparison without putting too much emphasis on the actual languages performances.
I’m currently teaching some numerical analysis class for undergrads. As many of my colleagues, at the moment we’re using numpy
because Python
. A Scientific Fortran Lectures repo could however become the go-to online resources for educators and serve as a really good advocate for putting Fortran
back in the limelight and in the classroom. This is even more true now that a lot of us are using conda
as package manager which makes installing compilers really easy on pretty much any system. Likewise, it has never been easier to create Fortran
projects now that fpm
has reached a certain level of maturity and similarly with the on-going initiative by @FedericoPerini and others regarding the linear algebra module in stdlib
. Finally, the work by @certik and colleagues on LFortran
and the associated jupyter kernel wouldn’t lead to a dramatic change in the workflow of students who, for most of them, are already familiar with jupyter notebooks.
With this in mind, I’ve started to port some of my lecture material to Fortran
out of curiosity, beginning with the Hello World!
of partial differential equations: solving the 2D Poisson equation on a unit-square discretized with 2nd order accurate central finite differences. I’ve started with the simplest iterative solvers, i.e. the Jacobi and Gauss-Seidel iterations, and I do have a few technical questions about compiler stuff. The examples below are adapted from what is currently available in the fortran-lang/benchmarks repo.
Jacobi solver
Here is the simplest (yet reasonably high-performing) fortran
implementation of the Jacobi method to solve the 2D Poisson equation.
! Iterative solver.
do while (residual > tolerance)
! Update iteration counter.
iteration = iteration + 1
! Reset residual.
residual = 0.0_dp
! Jacobi iteration.
do j = 2, m-1
do i = 2, m-1
! Jacobi update.
phi_prime(i,j) = (phi(i+1,j)+phi(i-1,j)+phi(i,j+1)+phi(i,j-1))/4 + rhs(i, j)
! On-the-fly residual computation.
residual = max(residual, abs(phi_prime(i,j) - phi(i,j)))
end do
end do
! Update solution
phi = phi_prime
end do
Below are some timings obtained by compiling the code with gfortran 13.2.0
and various optimization level.
gfortran jacobi.f90 -O0 -o program && ./program
– 197 secondsgfortran jacobi.f90 -O1 -o program && ./program
– 73 secondsgfortran jacobi.f90 -O2 -o program && ./program
– 24 secondsgfortran jacobi.f90 -O3 -o program && ./program
– 31 secondsgfortran jacobi.f90 -Ofast -o program && ./program
– 31 seconds
In all cases, the Jacobi solver run for 376 000 iterations (+/- a few dozens depending on the optimization level). Interestingly, it seems like -02
leads to significantly better performances than higher optimization levels.
Gauss-Seidel solver
Given the Jacobi solver, it requires only a very limited number of modifications to transform it into a Gauss-Seidel one : phi_prime
no longer is an array but a simple real(dp)
to store the current value phi(i, j)
before we directly update it while still being able to compute the residual on the fly.
! Iterative solver.
do while (residual > tolerance)
! Update iteration counter.
iteration = iteration + 1
! Reset residual.
residual = 0.0_dp
! Gauss-Seidel iteration.
do j = 2, m-1
do i = 2, m-1
! Store old value for residual computation.
phi_prime = phi(i, j)
! Gauss-Seidel update.
phi(i,j) = (phi(i+1,j)+phi(i-1,j)+phi(i,j+1)+phi(i,j-1))/4 + rhs(i, j)
! On-the-fly residual computation.
residual = max(residual, abs(phi_prime - phi(i,j)))
end do
end do
end do
Below are some timings obtained by compiling the code with gfortran 13.2.0
and various optimization level.
gfortran gauss_seidel.f90 -O0 -o program && ./program
– 134 secondsgfortran gauss_seidel.f90 -O1 -o program && ./program
– 91 secondsgfortran gauss_seidel.f90 -O2 -o program && ./program
– 74 secondsgfortran gauss_seidel.f90 -O3 -o program && ./program
– 75 secondsgfortran gauss_seidel.f90 -Ofast -o program && ./program
– 62 seconds
In all cases, the Gauss-Seidel solver runs for 197 000 iterations (+/- a few dozens depending on the optimization level). As expected from the math, the Gauss-Seidel solver needs roughly half the number of iterations to converge compared to Jacobi.
The plot twist…
It is fairly easy to show mathematically that the Gauss-Seidel solver should takes roughly half the number of steps needed by Jacobi for convergence. This is indeed the case here (iteration = 376000
for Jacobi and iteration = 197000
for Gauss-Seidel). For this particular problem, proving it is actually a classical exercise in an undergrad numerical analysis class.
When it comes to the numerical implementation, math is not however all there is. Given the Jacobi update rule
phi_prime(i, j) = (phi(i+1,j)+phi(i-1,j)+phi(i,j+1)+phi(i,j-1))/4 + rhs(i, j)
and the Gauss-Seidel one
phi(i, j) = (phi(i+1,j)+phi(i-1,j)+phi(i,j+1)+phi(i,j-1))/4 + rhs(i, j)
the number of operations per update is basically the same. Naively, I would have expected that if Gauss-Seidel needs half as many iterations to converge than Jacobi, and each iteration has roughly the same computational cost, then Gauss-Seidel would actually take half the computational time of Jacobi as well. While this is somewhat the case for -00
(albeit we don’t quite have a ratio of 0.5 for the time), it certainly is not when a higher optimization level is considered. In practice, while Jacobi performs twice as many iterations as Gauss-Seidel to converge, it can actually be more than twice as fast!
gfortran jacobi.f90 -O2 -o program && ./program
– 24 secondsgfortran gauss_seidel.f90 -Ofast -o program && ./program
– 62 seconds
I’m pretty sure this has been observed many times over the years but I can hardly find this observation mentioned in the literature. I suspect moreover that the reason for this is some compiler witchcraft related to the independence of the iterations in the Jacobi solver combined with automatic loop unrolling and/or cache-blocking.
Note that the Gauss-Seidel implementation here is the simplest one there is. The 2nd order central finite-difference approximation of the Laplacian leads to a consistently ordered matrix representation for which a red/black ordering of the nodes leads to two subproblems with loop indepedence (i.e. Jacobi-like). I haven’t had time to code it up yet, but I suspect it’ll be significantly faster than this vanilla Gauss-Seidel implementation because of the same exact reasons that make the Jacobi implementation faster than the vanilla Gauss-Seidel one.
While being more than familiar with the mathematical aspects of things (I actually teach this), I’m not really familiar with these compiler technical details. Any hindsight is more than appreciated!