Has anyone done some attempts at automatically identifying unused code in large Fortran projects?
Take the following as an example:
! unused_no_interface.f90
integer function unused_no_interface(i) result(j)
implicit none
integer, intent(in) :: i
j = i + 1
end function
! some_module.f90
module some_module
implicit none
private
public used_sub
public unused_sub
public unused_val
integer :: unused_val = 123
integer :: unused_val2 = 123
contains
subroutine used_sub()
write(*,*) 'Hello world'
end subroutine
subroutine unused_sub()
write(*,*) 'The answer is 42'
end subroutine
subroutine unused_sub2()
write(*,*) 'I am not used'
end subroutine
end module
! main.f90
program main
! Imports all procedures (yuck!), but only use one
use some_module
implicit none
call used_sub()
end program
gfortran with -Wall will correctly identify unused_sub2 and unused_val2 as unused. As the compiler only operates on one file at a time it has no chance of detecting that unused_no_interface, unused_sub and unused_val also are unused though.
Such a tool would be very handy for large Fortran projects, particularly those with much old (legacy) code, but I havenāt been able to find any. It could certainly be done by parsing source files but that would require a more or less complete Fortran grammar, close to what the compilers already have.
A tool we have developed internally that is not yet ready for primetime:
The index of names:
nag_xref: Information - Indexing ...
Index of /tmp/main.f90
MAIN (PROGRAM at line 2)
4 SOME_MODULE ........................ line 2, /tmp/some_module.f90
7 USED_SUB ........................... line 15, /tmp/some_module.f90
Index of /tmp/some_module.f90
SOME_MODULE (MODULE at line 2)
6 USED_SUB ........................... line 15
7 UNUSED_SUB ......................... line 19
8 UNUSED_VAL ......................... (definition/first-use)
10 UNUSED_VAL ......................... line 8
11 UNUSED_VAL2 ........................ (definition/first-use)
Index of /tmp/unused_no_interface.f90
UNUSED_NO_INTERFACE (FUNCTION at line 2)
2 I .................................. (dummy argument)
J .................................. (definition/first-use)
4 I .................................. line 2
5 J .................................. line 2
I .................................. line 2
nagxref: Normal termination
The cross-referencing information:
nag_xref: Information - Cross-referencing ...
+-----------------------------------------------------------------------------
| /tmp/main.f90 (FILE)
+-----------------------------------------------------------------------------
| MAIN 2#
+-----------------------------------------------------------------------------
+-----------------------------------------------------------------------------
| /tmp/some_module.f90 (FILE)
+-----------------------------------------------------------------------------
| SOME_MODULE 2#
| 4u /tmp/main.f90
+-----------------------------------------------------------------------------
+-----------------------------------------------------------------------------
| SOME_MODULE (MODULE in /tmp/some_module.f90)
+-----------------------------------------------------------------------------
| UNUSED_SUB 7d 19#
| UNUSED_SUB2 23#
| UNUSED_VAL 8d 10*
| UNUSED_VAL2 11*
| USED_SUB 6d 15#
| 7 /tmp/main.f90
+-----------------------------------------------------------------------------
+-----------------------------------------------------------------------------
| /tmp/unused_no_interface.f90 (FILE)
+-----------------------------------------------------------------------------
| UNUSED_NO_INTERFACE 2#
+-----------------------------------------------------------------------------
+-----------------------------------------------------------------------------
| UNUSED_NO_INTERFACE (FUNCTION in /tmp/unused_no_interface.f90)
+-----------------------------------------------------------------------------
| I 2A 4+ 5
| J 2< 5*
+-----------------------------------------------------------------------------
Explanation of cross-reference symbols:
#: definition/first-use
%: component name
*: assigned a value
+: declaration
<: RESULT name
a: argument to procedure call
A: dummy argument
c: appears in named COMMON
C: appears in blank COMMON
d: PUBLIC/PRIVATE declaration
D: appears in DATA statement
e: in EQUIVALENCE
g: use via generic name
i: used in an INTERFACE block
k: kind specification
R: renamed
s: SAVEd
t: derived-type
u: USEd module
nagxref: Normal termination
If you do not mind a bit of extra setup work and getting detailed reports in return beyond simple dead-code identification, then the combination of GFortran + gcov + lcov can do magic like this.
You did not clearly state what āunusedā means to you. If you have a subroutine unused_sub in one of your sources and there is no call to it in any of the sources belonging to a project, including prebuilt libraries, if any, the standard Unix utilities sed and grep can do most of the grunt work of identifying the unused subprograms.
I put your source code into two files, modu.f90 and unused.f90.
Comparing the two lists output shows that only used_sub is called.
You can attempt similar scans for functions, but this may not work so well, because function references can occur inside expressions, etc., and a local array may have the same name as an external function.
There are tools such as Polyhedronās HyperKWIC that can generate HTML reports from all Fortran source files in a project, and you can scan the reports for unused functions and subroutines.
So far, I have mentioned tools that perform a static analysis, using text patterns that I expect to be present in the source files (e.g., nothing other than blanks appear to the left of subroutine). It may also happen that there are subroutines and corresponding calls in the sources but the call may never get executed or may only get executed for some program input data. These are the ones that code coverage and profiling tools can detect, as others have already described. Be careful about such subroutines, however, because a future user of the program may provide data for which the call is needed. When working with these cases, I often replace the seemingly unneeded subroutine body with a stop āSubroutine unused() was calledā statement.
Identifying unused procedures probably should be static analysis and not determined by the test suite so that rules out Gcov for me. Gcov is still a great tool and I hope Iāll be able to set it up for our projects in the future though!
@mecej4 Iāve been thinking about implementing similar approaches for automating this, either with tools like sed and grep or with some Python scripting. Iāve come to the conclusion that in order to be able to do this reliably one would have to make a parser which understands a large amount of the Fortran language. Here are some examples that a simple sed/grep approach will struggle with:
Multiple modules can have procedures with the same name. Which one is used depends on the use statement and not only the call statement in that program unit so more context is needed that just the line with the call.
To identify function use (and not only subroutines) one must be able to separate arrays from function calls. Again, more context than the single line where the function name occurs is needed.
@certik great to see that LFortran actually does this. Exposing this information to the user would be a nice feature indeed! As mentioned above I think a tool like this needs a very good (if not complete) understanding of the Fortran grammar so a compiler is very well suited for this task. In general I think Fortran would benefit greatly from a static analysis tool similar to rust-analyzer, pylint, etc. Perhaps parts of LFortran could be reused for this purpose?
I found a suggestion on StackOverflow that one could use nm on the compiled object files to identify all declared symbols and then once again on the executables produced by the project to see which symbols where dropped at link time. I was able to identify and delete quite a few procedures with this approach! It seemed to struggle a bit with the name mangling of Fortran modules though so I think it could be difficult to automate this reliably to have it as a part of a CI-pipeline.
A Q&A on StackOverflow about identifying the intents of procedure arguments, may be of interest for your use case too.
One could e.g. use gfortranās -fdump-fortran-original to get the internal representation of your source code (similar to the functionality provided by LFortran, if I understand correctly).
Then base your own Python/sed/grep parsing on gfortranās output, knowing that you are now working on things the way the compiler interpreted it.
Yes, we tried to design LFortran so that writing such static analysis tools would be possible.
The unused code seems like one such example.
Indeed, to make any of these tools 100% robust, you really have to use a compiler in my opinion that can actually compile your code and you can test that it runs correctly. So not just the parser but also that all semantic analysis is correct.
I tried LFortran on your above test example and it can compile it. The unused pass currently only does this for functions, but not subroutines yet. Weāll fix that soon.
I will mention one complication that our pass handles but would be tough without a compiler: generic functions. They are call like sin(x), but they dispatch to sin_f32, sin_f64 etc. based on the argument types. If you call it with x single precision, then the unused pass will remove sin_f64, but keep the generic function (now with just one implementation to dispatch to). Here x might not just be a variable, but a result of another function (or an expression), so you really need to know all the types and kinds (and correctly infer the final kind of an expression), so you are now redoing all the work that a compiler has to do anyway.
I assume that LFortran cannot yet compile your actual code, so that is our highest priority now. Once we can compile your code, I think you might be very interested in collaborating with us on such a tool and an interface to it.
fpt will tell you:
i. Which primary files (i.e. not INCLUDE files) are unused.
ii. Which sub-programs are unused
iii. Which pieces of executable code are unreachable
iv. Which variables are declared and not used (Latest version - contact me if you need this)
fpt also knows which values are computed but never read, but I need to check to what extent and how this is reported.
You can download it at http://simconglobal.com
I encountered this software codee, which aims to achieve your stated goal of static analysis, as well as many other tasks, such as automating modernization, optimization, and parallelization annotations. The CEO gave a presentation on the software at Argonne Lab today. It seems to be integrated with Cray software development tools but is a generic command-line application.
I think you also need to account for continuation lines, which can be partially achieved by including & in the regex character class. To be really robust youād have to form a regex that allows unlimited & and whitespace within the word call and doesnāt require a trailing space, but just a non-word character.
I have used nm(1) and other related commands for determining dependencies but there used to a number of compiler listings options that showed dependency trees, and doxygen generates call trees and fpm will shake out unused module files. I have not tried anything other than a cursory search but a call tree or dependency tree can be used and those usually include information at a procedure level where other utilities typically only produce file dependencies (for auto-generating makefiles of one flavor or another).
But do not overlook brute-force use of the compiler and loader. Start with compiling your main program and see what modules it needs to compile. build those and when you get to the point of loading you will get a list of unsatisfied externals. It probably works better with pre-module code but you can semi-automate the steps with a few ad-hoc scripts. Once the compile loads any files not used yet are not needed (unless doing something fancy with dynamic loading).