Semantics based diff of Fortran source files

In a 2014 comp.lang.fortran thread Comparing versions of a Fortran source code I wrote

I can compare two versions of a Fortran source file, containing a single module with multiple procedures, using diff on Unix or fc on Windows. I’d like something that is more Fortran-aware and which can say for example that code foo_2.f90 has procedures x, y, and z not present in foo_1.f90 and that procedures a and b have been modified but that procedures c and d are the same in the two files.

I am still interested in such a tool. Ideally it would also be able to detect if versions of a procedure in different source files differ in a meaningful way, where non-meaningful differences include differences in
(1) spacing
(2) capitalization (except in strings)
(3) comments
(4) orders of entities declared

FortranFan wrote that this is called “semantics based diff’ing”. Ian Harvey said he had created such a tool, but the links to it no longer work. John C. wrote

We did this in fpt -
fpt Reference: Compare Sub-programs

In addition to fpt, what is currently available?

2 Likes

I have some ideas on how to make this. Since the intermediate representation is not that important you could use either:

  • fparser and diff the trees in Python
  • lfortran and regex the outputs of the AST/ASR
  • gfortran (same as lfortran)
  • flang (the modern one with only the front and middle ends) and do the same as lfortran & gfortran

I think the most complete of these parsers should be lfortran’s, with gfortran being probably the most robust but lacking a couple f2018 features (like full import host association off the top of my head)

1 Like

Yes, comparing the AST/ASR is great, particularly if comparing a F77 version of a procedure to an F90+ version; particularly after auto-converting them with fpt and spag/plusfort, … That is by far the more powerful method; but note that diff has options to ignore differences in white space and case and some of the simpler cases mentioned; and there are tools to let you ignore comment differences, such as GitHub - urbanjost/flower: fpm package to build flower(1), a utility for changing case of Fortran free-format files which lets you strip comments from free-format code as one example. Some utilitities like fprettify can output files that are more similar than the input because they follow formatting rules for things like indenting that can align two versions’ white-space usage. So for some simple cases generic tools are enough. So if case is the major reason diff(1) output is cluttered, just use the --ignore-case switch, for example.

PS:
The cases where multiple procedures are in a file and have been reordered or have some missing routines are a problem that I encounter frequently too. I made little scripts that run those files through fsplit and f90split and compare the resulting (scratch) directories. That “divide and conquer” approach often simplifies the comparisons. Since they do not split contained procedures that approach is not as good as it use to be if the procedures are all in modules (which f90split leaves in one file). I have been considering making a change to f90split so it has an option to convert all the contained procedures into separate INCLUDE files partly for that reason, and partly because I have other reasons to want to split a large module that way for easier maintenance, but haven’t ( I just cheat and remove the module header and run it through the existing split commands and then stick the INCLUDE directives in manually).

flang has the ability to “unparse” its parse tree back to Fortran. This gets rid of comments and normalizes indentation and capitalization (and converts fixed form to free form). So you could diff the normalized source.

You invoke it as:
flang-new -fc1 -fdebug-unparse-no-sema source.f90

flang’s parser is supposed to be complete F2018 too.

1 Like

AFAIK there is one fundamental problem with flang-new, how do you install it?

fparser is available through PyPi and conda-forge so is lfortran + tarballs. Gfortran is similar, you can install in a plethora of ways and all 3 of them are cross platform.

flang-new on the other hand is only supported for Windows and Linux, is available via conda-forge, but has not been updated in 1.5 years and for some reason the Linux version is flang 5.0, so no flang-new Flang :: Anaconda.org. I also couldn’t find a PyPi package and checking on the flang-new GitHub repo only build from source instructions are provided: llvm-project/flang at main · llvm/llvm-project · GitHub. That to me disqualifies flang-new as a viable candidate.

I would actually be happy if I was wrong about this and there was an easy way to install flang-new on Linux/Windows, that is also up to date, since it would finally allow me to add support for it in VS Code. Is there something I am missing here?

The new flang is not in many distributions yet, the most prominent I know are MSYS2 (Base Package: mingw-w64-flang - MSYS2 Packages), homebrew (GitHub - carlocab/homebrew-personal: When homebrew/core isn't enough.) and macports (flang-14 | MacPorts). There might be others as well, but I’m not aware of them yet.

Also, don’t get fooled by flang 5.0 on conda-forge, this is actually the classic flang compiler using the version of LLVM it was compiled against.

There are binary releases of LLVM now that include flang. The latest version is 14.0.6 with the downloads found here: Release LLVM 14.0.6 · llvm/llvm-project · GitHub

The clang+llvm-* downloads have the flang-new binaries in them.

1 Like

Can someone who is familiar with new flang tell me the actual status of the project. I see no information that says it can create an executable binary on its own. Basically, when can i can point my Linux Mint systems to a repository or some deb packages and do

sudo apt-get install llvm clang flang

and have a working compiler the creates executables without using gcc etc.

My understanding is that if you jump through the right hoops you can get it to compile to an executable in some cases. They are focusing on code generation for Fortran 95 right now but I don’t think it’s usable for real work yet.

I’ve been experimenting with the parser (including scanning and preprocessing) and that seems very solid.