How to begin reading the code of a large program?

I am trying to read the code of siesta, which includes maybe thousands of lines of code. So the first thing is to read and understand the architecture of the code, I try to find some tools to help me read the code, but I am not sure what is the mainstream method for this kind of job.
My question is:
What is the best option or toolchain to read the code of a large program?

I think @Dauptain could suggest a few tools. Here is one of his blog posts:


Hello guys.

finter can help you a bit by showing the nesting of the code so at least you will grasp the “volumes” of your repository, but I think it will miss the point here.

I do not know many free tools yet to do for fortran what “findimports” or “pycallgraph” do for python, but I will not leave a comrade in the weeds:

  • The parser of lizard almost build the callgraph, but I must admit we stopped halfway when we worked on flinter. Still lizard will tell you the longest and the most complex routines. Which can be useful info.

  • I know doxygen can build an html callgraph , but this works only if the code community did add some doxygen doc.

  • There is an angle with codemetrics. By scanning the versionning system (usually the git base) you can get a lot of info from the code community. What are the most edited routine, the most complex routine, the recent work .If the code is on or github (siesta is on, use codescene (free and very sexy) The older version, more manual is code -maat. We are working on something similar (anubis) but its not ready for production yet.

  • I think fpt could be worth a try (I admit I failed to install it)

  • there are tons of plugins on Visual Studio Code , and some are for fortran. Not sure you can really get a callgraph , but you can move very fast with this IDE.

  • if you are bored in a meeting , try to do a word cloud on the sources. Fun, and a informative to a certain extent

1 Like

My two cents is that it may look like a daunting task at first, and definitely won’t get easier by using external tools. They can help you along the way, but you’ll have to do the hard work first.

I suggest you start from the main program(s): in other words search for files keyword program and end program, open them, start reading what they do.

It’s like starting from the root of the tree. You can traverse the branches as deep as you want later.

At Lockheed, there was a ESP2 program that came from some college (OHIO ?) that the listing was some 6" thick of source code. We needed to speed it up. So, I started by following calls to all the subroutines. Each subroutine was shown/listed with all its calls to subroutines with the arguments being passed on. This gave us a flowchart of how things were connected. Then all these parameters were put into a block common that was in an include file. This resulted in speeding up the execution time plus reducing the source code listing size.

Just an idea … help it helps :slight_smile:

1 Like

Maybe FORD is the most commonly used tool – it is used by stdlib. It requires the code to be commented in a certain way. My list of Fortran tools has the categories Automatic Documentation and Static Analysis.

If I were trying to understand a particular subroutine, for example the one below from arw.f, one thing I would do is document the subroutine calls to list the intent(in out) and intent(out) variables

     &                 NCOR, NNODE, Z, A, B )
! Integrates the radial Scroedinger equation for a given energy
! Input:
!   real*8  E     : Energy of the required wavefunction
!   real*8  DGDR  : Required logarithmic derivative dlogG/dr at RMAX
!   real*8  RMAX  : Maximum radius. Must be equal to r(NR)
!   real*8  H(NR) : Radial hamiltonian
!   real*8  S(NR) : Metric function. H and S are defined for a radial
!                   variable mesh so that d2G/dj2=(H(j)-E*S(j))*G(j)
!                   where G(j)=(dr/dj)^(3/2)*r(j)*psi(r(j))
!                   For a logarithmic mesh (see below), they are
!                     S(j) = (A*r(j))^2
!                     H(j) = S(j)*V(r(j)) + A^2/4
!                   where V(r) is the total effective radial potential
!                   in Rydbergs, including the kinetic term l*(l+1)/r^2
!   integer NR    : Number of radial points (including r(1)=0)
!   integer L     : Angular momentum
!   integer NCOR  : Number of core states of same L below the given one
!   integer NNODE : Required number of nodes for wavefunction
!   real*8  Z     : Atomic number
!   real*8  A,B   : Log-mesh parameters:
!                    r(j) = B*(exp(A*(j-1)) - 1),   j=1,2,...,NR
! Output:
!   real*8  DE    : Estimate of energy change required to eliminate kink
!   real*8  Y(NR) : Numerov function, related to G above by
!                     Y(j)=G(j)*(12-H(j)+E*S(j))/12
!   integer NNODE : Actual number of nodes of wavefunction
      use precision, only: dp
      INTEGER,           intent(in) :: NR, L, NCOR
      INTEGER,          intent(out) :: NNODE
      real(dp)        ,  intent(in) :: E, DGDR, RMAX, H(NR), S(NR),
     &                                 Z, A, B
      real(dp)        , intent(out) :: DE, Y(NR)

      real(dp)        ,PARAMETER:: TMAX  =1.D0 ! Max negative kin engy
      real(dp)        ,PARAMETER:: DLGMAX=1.D3 ! Max log deriv at Rmax
      INTEGER          :: KNK, NR0, NNDIN
      real(dp)         :: GIN, GOUT, GSG, GSGIN, GSGOUT, RATIO, 
     &                    T, XIN, XOUT, Y2, YN, ZDR

      ! Find where the negative kinetic energy H-ES becomes too large
      ! and make Y=0 beyond that point
      DO NR0 = NR,1,-1
        IF ( H(NR0)-E*S(NR0) .LT. TMAX ) EXIT
      END DO ! NR0

      ! Find boundary condition at origin
      ZDR = Z*A*B

      ! Integrate Schroedinger equation outwards
      KNK=NR0  ! A new value for the kink position KNK will be returned

      ! Find if kinetic energy is sufficiently non negative to use
      ! Numerov at Rmax. Otherwise use zero-value boundary condition
      END IF

      ! Integrate Schroedinger equation inwards

      ! Make wavefunction continuous at R(KNK)
      Y(KNK:NR0) = Y(KNK:NR0)*RATIO

      ! Estimate the energy change required to eliminate kink

      ! Find the number of nodes
      IF (DE.LT.0.D0) NNODE=NNODE+1

How did you trace all calls to each subroutine? Any recommendations for the tool?

No don’t know of any. It just amounts to deleting all lines without a subroutine or call statement on them. But, it sure did clarify things for us.

(If I remember correctly) doxygen can generate a callgraph for code with no (or minimal) markup. I am looking one such callgraph now, but the build scripts are elsewhere. I can dig them up if there is interest, but I am sure you can find better by searching.

We find the doxygen callgraph useful. We played with adding doxygen markup but lost interest as we didn’t read it.

Yep. Set EXTRACT_ALL=YES in the Doxyfile, and it should be able to make the call graph.

One option is to run the code with callgrind and then view the profiler results using kcachegrind.

A useful guide to Doxygen and Fortran - with a sample project - is at It comes with the output prebuilt, so you can examine the finished product before messing about with the tools. I used an earlier version to get going a few years ago.

I had a look at the siesta repository. The Fortran code uses a preprocessor. That may present a few “learning opportunities”.