How to begin reading the code of a large program?

Alma · June 16, 2022, 10:51am

I am trying to read the code of siesta, which includes maybe thousands of lines of code. So the first thing is to read and understand the architecture of the code, I try to find some tools to help me read the code, but I am not sure what is the mainstream method for this kind of job.
My question is:
What is the best option or toolchain to read the code of a large program?

ivanpribec · June 16, 2022, 11:10am

I think @Dauptain could suggest a few tools. Here is one of his blog posts:

Dauptain · June 16, 2022, 1:07pm

Hello guys.

finter can help you a bit by showing the nesting of the code so at least you will grasp the “volumes” of your repository, but I think it will miss the point here.

I do not know many free tools yet to do for fortran what “findimports” or “pycallgraph” do for python, but I will not leave a comrade in the weeds:

The parser of lizard almost build the callgraph, but I must admit we stopped halfway when we worked on flinter. Still lizard will tell you the longest and the most complex routines. Which can be useful info.
I know doxygen can build an html callgraph , but this works only if the code community did add some doxygen doc.
There is an angle with codemetrics. By scanning the versionning system (usually the git base) you can get a lot of info from the code community. What are the most edited routine, the most complex routine, the recent work .If the code is on gitlab.com or github (siesta is on gitlab.com), use codescene (free and very sexy) The older version, more manual is code -maat. We are working on something similar (anubis) but its not ready for production yet.
I think fpt could be worth a try (I admit I failed to install it)
there are tons of plugins on Visual Studio Code , and some are for fortran. Not sure you can really get a callgraph , but you can move very fast with this IDE.
if you are bored in a meeting , try to do a word cloud on the sources. Fun, and a informative to a certain extent

FedericoPerini · June 16, 2022, 1:27pm

My two cents is that it may look like a daunting task at first, and definitely won’t get easier by using external tools. They can help you along the way, but you’ll have to do the hard work first.

I suggest you start from the main program(s): in other words search for files keyword program and end program, open them, start reading what they do.

It’s like starting from the root of the tree. You can traverse the branches as deep as you want later.

OptimalDesigns · June 16, 2022, 1:36pm

At Lockheed, there was a ESP2 program that came from some college (OHIO ?) that the listing was some 6" thick of source code. We needed to speed it up. So, I started by following calls to all the subroutines. Each subroutine was shown/listed with all its calls to subroutines with the arguments being passed on. This gave us a flowchart of how things were connected. Then all these parameters were put into a block common that was in an include file. This resulted in speeding up the execution time plus reducing the source code listing size.

Just an idea … help it helps

Beliavsky · June 16, 2022, 4:05pm

Maybe FORD is the most commonly used tool – it is used by stdlib. It requires the code to be commented in a certain way. My list of Fortran tools has the categories Automatic Documentation and Static Analysis.

If I were trying to understand a particular subroutine, for example the one below from arw.f, one thing I would do is document the subroutine calls to list the intent(in out) and intent(out) variables

      SUBROUTINE YOFE( E, DE, DGDR, RMAX, H, S, Y, NR, L,
     &                 NCOR, NNODE, Z, A, B )
!***********************************************************************
! Integrates the radial Scroedinger equation for a given energy
! Input:
!   real*8  E     : Energy of the required wavefunction
!   real*8  DGDR  : Required logarithmic derivative dlogG/dr at RMAX
!   real*8  RMAX  : Maximum radius. Must be equal to r(NR)
!   real*8  H(NR) : Radial hamiltonian
!   real*8  S(NR) : Metric function. H and S are defined for a radial
!                   variable mesh so that d2G/dj2=(H(j)-E*S(j))*G(j)
!                   where G(j)=(dr/dj)^(3/2)*r(j)*psi(r(j))
!                   For a logarithmic mesh (see below), they are
!                     S(j) = (A*r(j))^2
!                     H(j) = S(j)*V(r(j)) + A^2/4
!                   where V(r) is the total effective radial potential
!                   in Rydbergs, including the kinetic term l*(l+1)/r^2
!   integer NR    : Number of radial points (including r(1)=0)
!   integer L     : Angular momentum
!   integer NCOR  : Number of core states of same L below the given one
!   integer NNODE : Required number of nodes for wavefunction
!   real*8  Z     : Atomic number
!   real*8  A,B   : Log-mesh parameters:
!                    r(j) = B*(exp(A*(j-1)) - 1),   j=1,2,...,NR
! Output:
!   real*8  DE    : Estimate of energy change required to eliminate kink
!   real*8  Y(NR) : Numerov function, related to G above by
!                     Y(j)=G(j)*(12-H(j)+E*S(j))/12
!   integer NNODE : Actual number of nodes of wavefunction
!***********************************************************************
      use precision, only: dp
      IMPLICIT NONE
      INTEGER,           intent(in) :: NR, L, NCOR
      INTEGER,          intent(out) :: NNODE
      real(dp)        ,  intent(in) :: E, DGDR, RMAX, H(NR), S(NR),
     &                                 Z, A, B
      real(dp)        , intent(out) :: DE, Y(NR)

      real(dp)        ,PARAMETER:: TMAX  =1.D0 ! Max negative kin engy
      real(dp)        ,PARAMETER:: DLGMAX=1.D3 ! Max log deriv at Rmax
      INTEGER          :: KNK, NR0, NNDIN
      real(dp)         :: GIN, GOUT, GSG, GSGIN, GSGOUT, RATIO, 
     &                    T, XIN, XOUT, Y2, YN, ZDR

      ! Find where the negative kinetic energy H-ES becomes too large
      ! and make Y=0 beyond that point
      DO NR0 = NR,1,-1
        IF ( H(NR0)-E*S(NR0) .LT. TMAX ) EXIT
        Y(NR0)=0.D0
      END DO ! NR0

      ! Find boundary condition at origin
      ZDR = Z*A*B
      CALL BCORGN(E,H,S,L,ZDR,Y2)

      ! Integrate Schroedinger equation outwards
      KNK=NR0  ! A new value for the kink position KNK will be returned
      CALL NUMOUT(E, H, S, Y, NCOR, KNK, NNODE, Y2, GOUT, GSGOUT, XOUT)

      ! Find if kinetic energy is sufficiently non negative to use
      ! Numerov at Rmax. Otherwise use zero-value boundary condition
      IF (NR0.EQ.NR .AND. ABS(DGDR).LE.DLGMAX) THEN
        CALL BCRMAX(E,DGDR,RMAX,H,S,NR0,YN,A,B)
      ELSE
        YN=0.D0
      END IF

      ! Integrate Schroedinger equation inwards
      CALL NUMIN(E,H,S,Y,NR0,NNDIN,YN,GIN,GSGIN,XIN,KNK)

      ! Make wavefunction continuous at R(KNK)
      RATIO = GOUT/GIN
      Y(KNK:NR0) = Y(KNK:NR0)*RATIO
      XIN = XIN*RATIO
      GSGIN = GSGIN*RATIO**2

      ! Estimate the energy change required to eliminate kink
      GSG=GSGOUT+GSGIN
      T=H(KNK)-E*S(KNK)
      DE=GOUT*(XOUT+XIN+T*GOUT)/GSG

      ! Find the number of nodes
      NNODE=NNODE+NNDIN
      IF (DE.LT.0.D0) NNODE=NNODE+1
      END SUBROUTINE YOFE

Alma · June 16, 2022, 4:23pm

How did you trace all calls to each subroutine? Any recommendations for the tool?

OptimalDesigns · June 16, 2022, 7:38pm

No don’t know of any. It just amounts to deleting all lines without a subroutine or call statement on them. But, it sure did clarify things for us.

DavidB · June 17, 2022, 12:17am

(If I remember correctly) doxygen can generate a callgraph for code with no (or minimal) markup. I am looking one such callgraph now, but the build scripts are elsewhere. I can dig them up if there is interest, but I am sure you can find better by searching.

We find the doxygen callgraph useful. We played with adding doxygen markup but lost interest as we didn’t read it.

dwwork · June 17, 2022, 1:18am

Yep. Set EXTRACT_ALL=YES in the Doxyfile, and it should be able to make the call graph.

dwwork · June 17, 2022, 1:23am

One option is to run the code with callgrind and then view the profiler results using kcachegrind.

DavidB · June 17, 2022, 2:52am

A useful guide to Doxygen and Fortran - with a sample project - is at https://git.mysmce.com/astg/docs/-/wikis/Using-Doxygen-with-Fortran-source-code. It comes with the output prebuilt, so you can examine the finished product before messing about with the tools. I used an earlier version to get going a few years ago.

I had a look at the siesta repository. The Fortran code uses a preprocessor. That may present a few “learning opportunities”.

Topic		Replies	Views
Google Summer of Code 2025 GSoC-2025	23	1291	April 22, 2025
Code poetry (are you experienced with Fortran words?)	3	514	September 4, 2023
Documentation tools for Fortran codes Help	10	3362	July 19, 2021
A Modern Fortran Scientific Programming Ecosystem Announcements	8	1874	October 17, 2022
Hi! looking for new Fortran resources	15	1234	October 7, 2022

How to begin reading the code of a large program?

Related topics