Identifying unused code in large Fortran projects

Has anyone done some attempts at automatically identifying unused code in large Fortran projects?

Take the following as an example:

! unused_no_interface.f90
integer function unused_no_interface(i) result(j)
    implicit none
    integer, intent(in) :: i
    j = i + 1
end function


! some_module.f90
module some_module
    implicit none

    private
    public used_sub
    public unused_sub
    public unused_val

    integer :: unused_val = 123
    integer :: unused_val2 = 123

contains

    subroutine used_sub()
        write(*,*) 'Hello world'
    end subroutine

    subroutine unused_sub()
        write(*,*) 'The answer is 42'
    end subroutine

    subroutine unused_sub2()
        write(*,*) 'I am not used'
    end subroutine
end module


! main.f90
program main
    ! Imports all procedures (yuck!), but only use one
    use some_module
    implicit none

    call used_sub()
end program

gfortran with -Wall will correctly identify unused_sub2 and unused_val2 as unused. As the compiler only operates on one file at a time it has no chance of detecting that unused_no_interface, unused_sub and unused_val also are unused though.

Such a tool would be very handy for large Fortran projects, particularly those with much old (legacy) code, but I havenā€™t been able to find any. It could certainly be done by parsing source files but that would require a more or less complete Fortran grammar, close to what the compilers already have.

4 Likes

Gcov could be the tool (I have no experience):
https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

1 Like

A tool we have developed internally that is not yet ready for primetime:

The index of names:

nag_xref: Information - Indexing ...
Index of /tmp/main.f90
   MAIN (PROGRAM at line 2)
          4 SOME_MODULE ........................ line 2, /tmp/some_module.f90
          7 USED_SUB ........................... line 15, /tmp/some_module.f90


Index of /tmp/some_module.f90
   SOME_MODULE (MODULE at line 2)
          6 USED_SUB ........................... line 15
          7 UNUSED_SUB ......................... line 19
          8 UNUSED_VAL ......................... (definition/first-use)
         10 UNUSED_VAL ......................... line 8
         11 UNUSED_VAL2 ........................ (definition/first-use)


Index of /tmp/unused_no_interface.f90
   UNUSED_NO_INTERFACE (FUNCTION at line 2)
          2 I .................................. (dummy argument)
            J .................................. (definition/first-use)
          4 I .................................. line 2
          5 J .................................. line 2
            I .................................. line 2


nagxref: Normal termination

The cross-referencing information:

nag_xref: Information - Cross-referencing ...
+-----------------------------------------------------------------------------
| /tmp/main.f90 (FILE)
+-----------------------------------------------------------------------------
| MAIN                               2#
+-----------------------------------------------------------------------------


+-----------------------------------------------------------------------------
| /tmp/some_module.f90 (FILE)
+-----------------------------------------------------------------------------
| SOME_MODULE                        2#
|                                    4u                       /tmp/main.f90
+-----------------------------------------------------------------------------

+-----------------------------------------------------------------------------
| SOME_MODULE (MODULE in /tmp/some_module.f90)
+-----------------------------------------------------------------------------
| UNUSED_SUB                         7d    19#
| UNUSED_SUB2                       23#
| UNUSED_VAL                         8d    10*
| UNUSED_VAL2                       11*
| USED_SUB                           6d    15#
|                                    7                        /tmp/main.f90
+-----------------------------------------------------------------------------


+-----------------------------------------------------------------------------
| /tmp/unused_no_interface.f90 (FILE)
+-----------------------------------------------------------------------------
| UNUSED_NO_INTERFACE                2#
+-----------------------------------------------------------------------------

+-----------------------------------------------------------------------------
| UNUSED_NO_INTERFACE (FUNCTION in /tmp/unused_no_interface.f90)
+-----------------------------------------------------------------------------
| I                                  2A     4+     5
| J                                  2<     5*
+-----------------------------------------------------------------------------


Explanation of cross-reference symbols:
   #: definition/first-use
   %: component name
   *: assigned a value
   +: declaration
   <: RESULT name
   a: argument to procedure call
   A: dummy argument
   c: appears in named COMMON
   C: appears in blank COMMON
   d: PUBLIC/PRIVATE declaration
   D: appears in DATA statement
   e: in EQUIVALENCE
   g: use via generic name
   i: used in an INTERFACE block
   k: kind specification
   R: renamed
   s: SAVEd
   t: derived-type
   u: USEd module

nagxref: Normal termination

You can see that the unused names are not used.

2 Likes

I would try fpt for this - though from a glance at the reference manual, it will help only partially.

gcov is not quite the tool you need, as it reports the code coverage obtained from running the program and you are looking for a static analysis.

2 Likes

Good suggestion, I was going to post the same. @Jcollins can provide further guidance here.

2 Likes

@plevold, welcome to the forum! Great question.

We have implemented a pass in LFortran for that (it works globally): src/lfortran/pass/unused_functions.cpp Ā· 42c55b6001fd88470a5552b3b4a53aa3a09e12f9 Ā· lfortran / lfortran Ā· GitLab, but it is currently not exposed to be used by the user directly (it is used internally by the LLVM backend to only generate LLVM IR for code that is actually used). I think this might be a great idea to underline parts of code that are not used (globally). I created an issue for that: Identifying unused code in large Fortran projects (#626) Ā· Issues Ā· lfortran / lfortran Ā· GitLab.

1 Like

If you do not mind a bit of extra setup work and getting detailed reports in return beyond simple dead-code identification, then the combination of GFortran + gcov + lcov can do magic like this.

3 Likes

A crude approach that perhaps could be automated is

(1) Declare all entities private in all modules comprising the complete program and compile.

(2) Gfortran will complain about procedures that donā€™t exist. Make them public.

(3) Compile with gfortran -Werror=unused-function and delete the procedures it flags.

The idea is that every procedure should either be public and called from outside its module or private and called within its module.

1 Like

You did not clearly state what ā€œunusedā€ means to you. If you have a subroutine unused_sub in one of your sources and there is no call to it in any of the sources belonging to a project, including prebuilt libraries, if any, the standard Unix utilities sed and grep can do most of the grunt work of identifying the unused subprograms.

I put your source code into two files, modu.f90 and unused.f90.

T:\lang\NoInt>grep -in "^ *subroutine" *.f90 | sed -e "s/.*subroutine *//" -e "s/(.*$//"
used_sub
unused_sub
unused_sub2

T:\lang\NoInt>grep -in "^[ 0-9]*call " *.f90 | sed -e "s/^.*call //" -e "s/(.*)//"
used_sub

Comparing the two lists output shows that only used_sub is called.

You can attempt similar scans for functions, but this may not work so well, because function references can occur inside expressions, etc., and a local array may have the same name as an external function.

There are tools such as Polyhedronā€™s HyperKWIC that can generate HTML reports from all Fortran source files in a project, and you can scan the reports for unused functions and subroutines.

So far, I have mentioned tools that perform a static analysis, using text patterns that I expect to be present in the source files (e.g., nothing other than blanks appear to the left of subroutine). It may also happen that there are subroutines and corresponding calls in the sources but the call may never get executed or may only get executed for some program input data. These are the ones that code coverage and profiling tools can detect, as others have already described. Be careful about such subroutines, however, because a future user of the program may provide data for which the call is needed. When working with these cases, I often replace the seemingly unneeded subroutine body with a stop ā€˜Subroutine unused() was calledā€™ statement.

2 Likes

Thanks for the suggestions everybody!

Identifying unused procedures probably should be static analysis and not determined by the test suite so that rules out Gcov for me. Gcov is still a great tool and I hope Iā€™ll be able to set it up for our projects in the future though!

@mecej4 Iā€™ve been thinking about implementing similar approaches for automating this, either with tools like sed and grep or with some Python scripting. Iā€™ve come to the conclusion that in order to be able to do this reliably one would have to make a parser which understands a large amount of the Fortran language. Here are some examples that a simple sed/grep approach will struggle with:

  • Multiple modules can have procedures with the same name. Which one is used depends on the use statement and not only the call statement in that program unit so more context is needed that just the line with the call.
  • To identify function use (and not only subroutines) one must be able to separate arrays from function calls. Again, more context than the single line where the function name occurs is needed.

@certik great to see that LFortran actually does this. Exposing this information to the user would be a nice feature indeed! As mentioned above I think a tool like this needs a very good (if not complete) understanding of the Fortran grammar so a compiler is very well suited for this task. In general I think Fortran would benefit greatly from a static analysis tool similar to rust-analyzer, pylint, etc. Perhaps parts of LFortran could be reused for this purpose?

I found a suggestion on StackOverflow that one could use nm on the compiled object files to identify all declared symbols and then once again on the executables produced by the project to see which symbols where dropped at link time. I was able to identify and delete quite a few procedures with this approach! It seemed to struggle a bit with the name mangling of Fortran modules though so I think it could be difficult to automate this reliably to have it as a part of a CI-pipeline.

1 Like

A Q&A on StackOverflow about identifying the intents of procedure arguments, may be of interest for your use case too.

One could e.g. use gfortranā€™s -fdump-fortran-original to get the internal representation of your source code (similar to the functionality provided by LFortran, if I understand correctly).

Then base your own Python/sed/grep parsing on gfortranā€™s output, knowing that you are now working on things the way the compiler interpreted it.

1 Like

Yes, we tried to design LFortran so that writing such static analysis tools would be possible.

The unused code seems like one such example.

Indeed, to make any of these tools 100% robust, you really have to use a compiler in my opinion that can actually compile your code and you can test that it runs correctly. So not just the parser but also that all semantic analysis is correct.

I tried LFortran on your above test example and it can compile it. The unused pass currently only does this for functions, but not subroutines yet. Weā€™ll fix that soon.

I will mention one complication that our pass handles but would be tough without a compiler: generic functions. They are call like sin(x), but they dispatch to sin_f32, sin_f64 etc. based on the argument types. If you call it with x single precision, then the unused pass will remove sin_f64, but keep the generic function (now with just one implementation to dispatch to). Here x might not just be a variable, but a result of another function (or an expression), so you really need to know all the types and kinds (and correctly infer the final kind of an expression), so you are now redoing all the work that a compiler has to do anyway.

I assume that LFortran cannot yet compile your actual code, so that is our highest priority now. Once we can compile your code, I think you might be very interested in collaborating with us on such a tool and an interface to it.

1 Like

Unused code detection might be something the Camfort project could take on?

fpt will tell you:
i. Which primary files (i.e. not INCLUDE files) are unused.
ii. Which sub-programs are unused
iii. Which pieces of executable code are unreachable
iv. Which variables are declared and not used (Latest version - contact me if you need this)
fpt also knows which values are computed but never read, but I need to check to what extent and how this is reported.
You can download it at http://simconglobal.com

Best wishes,

John

1 Like

I encountered this software codee, which aims to achieve your stated goal of static analysis, as well as many other tasks, such as automating modernization, optimization, and parallelization annotations. The CEO gave a presentation on the software at Argonne Lab today. It seems to be integrated with Cray software development tools but is a generic command-line application.

1 Like

I think you also need to account for continuation lines, which can be partially achieved by including & in the regex character class. To be really robust youā€™d have to form a regex that allows unlimited & and whitespace within the word call and doesnā€™t require a trailing space, but just a non-word character.

I have used nm(1) and other related commands for determining dependencies but there used to a number of compiler listings options that showed dependency trees, and doxygen generates call trees and fpm will shake out unused module files. I have not tried anything other than a cursory search but a call tree or dependency tree can be used and those usually include information at a procedure level where other utilities typically only produce file dependencies (for auto-generating makefiles of one flavor or another).

But do not overlook brute-force use of the compiler and loader. Start with compiling your main program and see what modules it needs to compile. build those and when you get to the point of loading you will get a list of unsatisfied externals. It probably works better with pre-module code but you can semi-automate the steps with a few ad-hoc scripts. Once the compile loads any files not used yet are not needed (unless doing something fancy with dynamic loading).