Parsing contents of .mod files using Python

While developing the fortls Language Server I have always been troubled with how one deals with precompiled module files (.mod files). In contrast to normal modules the source code of the external module is not required to be present. This makes extracting information like public interfaces out of .mod files extremely complicated. Further complicating matters, .mod file contents vary between different compilers.

In gfortran at least, .mod files are just an optimised and gziped version of the Abstract Syntax Tree (AST) structure the compiler normally generates. Doing:

zcat modulefile.mod

will output the AST in plain text. However, the problem is that gfortran optimises away the PUBLIC/PRIVATE attributes that are normally attached to the AST nodes, so the format of the extracted information (and maybe the information itself) are not very helpful

One can generate the normal AST and compare the differences

gfortran -fdump-fortran-original mysrc.f90

Making matters worse, AST representations between compilers also differ so for fortls to be able to parse any .mod file it would also need to be able to parse all compiler AST representations to Python, which is unlikely to happen.

I would be interested in hearing your ideas about how one could parse some of the contents of a .mod file.

  1. Are there any ways of getting the public information + attributes and types that do not involve parsing the compiler-generated AST?
  2. Are there any Python APIs (GCC, Intel, LLVM, etc.) that could be used to obtain the ASTs in Python?
  3. If not, would it be a good idea to create some?

One solution to the incompatible module file problem that plaques anyone who desires to write libraries and/or applications that can be used by multiple compilers without forcing the user to compile a different version for each compiler he/she choses to use just to have compatible module files is for the Fortran community in general and the standards committee in particular to define a standard module file format in some kind of markup like language (maybe XML etc) that can be read and/or generated as a option to the default vendor formats. This could be controlled by compiler flags etc. A precompile step (hidden from the user) would probably be needed but I’m willing to sacrifice a little compile time for the flexibility that transportable module files would give you.

just my 2 cents

Why?

The above was the only thing I wanted to type, but FD has a minimum number of characters requirement. So,

Information about a private subprogram is not recorded in a gfortran *.mod file. In fact, a private subprogram might be completely in-lined, so you have no idea if a private subprogram existed or not.

private components of a public defined types are marked as private. private defined types are marked as private.

In short, there are no serviceable parts for a user.

Yes that would be ideal, a strategy similar to C++'s headers would be nice. I think the closest Fortran has to this are submodules.

As for adding the .mod file generation to the standard, I suspect that would be a hard sell to the committee members. Other than allowing for compiler agnostic Language Servers to provide information for external modules, I don’t necessarily see a lot of other applications.

Maybe it will make the lives of compiler Devs easier (in the long run) and improve the available tooling for the language and maybe it won’t. My guess is that it would require a disproportionately greater amount of work to standardise and implement when compared to the potential perceived benefits.

If any J3 committee members want to share their thoughts on this it would be great.

My library gfort2py GitHub - rjfarmer/gfort2py: Library to allow calling fortran code from python parses most of a gfortran mod file (The in dev version gfort2py/module_parse.py at dev2 · rjfarmer/gfort2py · GitHub does a better job of tracking all the data present, even if I don’t do anything with it).

Mod files are definitely not the way to go if you need to support multiple compilers. In that case all you can do is parse the fortran source code (like f2py does). Anything else will be entirely compiler dependent.

This can’t (I believe) actually be true. It would prevent submodules from working correctly. When compiling a module, it cannot be known that a submodule will not be used. And because a submodule has access to private module entities, they must always be included in a *.mod file. I’ll admit that a submodule wouldn’t be of any use without any interface blocks, but the presence of an interface block also doesn’t mean that there will necessarily be a submodule either.

% cat l.f90
module foo
   implicit none
   private bah
   public bar
   contains
      function bar()
         integer bar
         bar = bah()
      end function bar
      function bah()
         integer bah
         bah = 1
      end function bah
end module foo
% gfcx -c l.f90
% zcat foo.mod | grep -i bah
% zcat foo.mod
GFORTRAN module version '15' created from l.f90
(() () () () () () () () () () () () () () () () () () () () () () () ()
() () ())

()

()

()

()

()

(2 'bar' 'foo' '' 1 ((PROCEDURE UNKNOWN-INTENT MODULE-PROC DECL UNKNOWN
0 0 FUNCTION IMPLICIT_PURE) () (INTEGER 4 0 0 0 INTEGER ()) 0 0 () () 2
() () () 0 0)
3 'foo' 'foo' '' 1 ((MODULE UNKNOWN-INTENT UNKNOWN-PROC UNKNOWN UNKNOWN
0 0) () (UNKNOWN 0 0 0 0 UNKNOWN ()) 0 0 () () 0 () () () 0 0)
)

('bar' 0 2 'foo' 0 3)

Can’t possibly be true.

ok, now try and compile the following:

submodule (foo) foo_s
  implicit none
contains
  function baz()
    integer baz
    baz = bah()
  end function
end submodule

It of course wouldn’t be useful for anything, as nothing can see baz, but it’s still valid Fortran.

I think it would help if one could compare with the unoptimised AST. Basically, some of these UNKNOWN correspond to the visibility attribute

Namespace: A-Z: (UNKNOWN 0)
procedure name = foo
  symtree: 'bah'         || symbol: 'bah'          
    type spec : (INTEGER 4)
    attributes: (PROCEDURE PRIVATE MODULE-PROC  FUNCTION IMPLICIT_PURE)
    result: bah
  symtree: 'bar'         || symbol: 'bar'          
    type spec : (INTEGER 4)
    attributes: (PROCEDURE PUBLIC MODULE-PROC  FUNCTION IMPLICIT_PURE)
    result: bar
  symtree: 'foo'         || symbol: 'foo'          
    type spec : (UNKNOWN 0)
    attributes: (MODULE )

  code:
CONTAINS

  Namespace: A-Z: (UNKNOWN 0)
  procedure name = bah
    symtree: 'bah'         || symbol: 'bah' from namespace 'foo'

    code:
    ASSIGN foo:bah 1
    

CONTAINS

  Namespace: A-Z: (UNKNOWN 0)
  procedure name = bar
    symtree: 'bah'         || symbol: 'bah' from namespace 'foo'
    symtree: 'bar'         || symbol: 'bar' from namespace 'foo'

    code:
    ASSIGN foo:bar bah[[()]]