Exploring first class error messages for Fortran

Ondřej @certik recently reworked the error reporting in LFortran to be more in line with rustc. I can’t find whether we had a thread in discourse on this, but here is the relevant issue at the repo: #600.

I was wondering if there is interest to build the tools for rustc like error messages and diagnostic reports in Fortran and maybe even adopt it for all the major I/O libraries we are developing in the Fortran community? I started exploring an implementation of the rustc Diagnostic struct for toml-f in GitHub - awvwgk/pretty-diagnostics: Tools to create pretty diagnostic output.

Here is an example report for a duplicated key error created from a stub example (here without color):

error: duplicated key 'title' found
  --> example.toml:19:2-6
   |
 1 | # This is a TOML document.
 2 | title = "TOML Example"
   | ----- first defined here
 3 | [owner]
   :
18 |   dc = "eqdc10"
19 | [title]
   |  ^^^^^ table 'title' redefined here
20 | data = [ ["gamma", "delta"], [1, 2] ]
   |

Good tools alone don’t make good error messages, of course. It takes some effort to improve the diagnostic of an I/O library, but it might be worth to take the extra mile. What do you think?

3 Likes

Yes. The only issue is that I need it in C++. The diagnostic messages are used from the parser, to semantics, to the code generation backend.

One issue is what semantic information to use to represent it, here it is in LFortran:

src/lfortran/diagnostics.h · 53ded5898225dc9279ae70a2e2537d5cd3a3d1e0 · lfortran / lfortran · GitLab

But the hardest part actually is the rendering:

src/lfortran/diagnostics.cpp · 53ded5898225dc9279ae70a2e2537d5cd3a3d1e0 · lfortran / lfortran · GitLab

One has to render multiple primary and secondary error markers and multiple parts of the error message (parent + children) all in one source code listing. This part is the toughest. If we can collaborate on that, that would be very helpful.

And yes, I am very happy with the Rust style error messages with multiple error and warning labels. Here are some examples:

Add Rust style error messages (!1490) · Merge requests · lfortran / lfortran · GitLab

Here is another one:

The only way to improve those messages is to actually open an editor and highlight this in it, or some kind of a “less” (unix utility) like environment that you cannot edit, but can move around. One can also explore all kinds of interactive error reporting, where you can use arrow keys to move around or your mouse in VSCode and it shows all kinds of information about the error. But in a non-interactive way, the above is the best I can think of.

It might be good to also add 2 or 3 context lines above and below, like in “git diff”.

2 Likes

I don’t think the value of good error messages can be overstated. It would be great to explore a way to make it easier for developers to produce better error messages.

4 Likes

I did look into LFortran’s implementation initially, but found the way the diagnostics are stored difficult to unwrap in the rendering step. Getting the data structure right to actually make the rendering easy was tricky, my first attempt was starting top-down to produce an error report from a diagnostic, but this failed horribly.

The linked repo is my second attempt, if you follow the examples you can see the way I’m building the data structures and the rendering up. The strategies for implementing this in C++ or Fortran are not so much different, I can share my strategy if you want to steal an idea or two:

  1. easy to disable color support :wink:
  2. get a multi-line printout of any input with line numbering
  3. add a single label to a line with arbitrary context lines (mind beginning and end!)
    • defines the label type and implements most of the actual inline annotations
  4. extend the render step to insert multiple labels
    • deal with multiple labels on a single line
    • ensure correct order of source line context (almost automatic)
    • skip unneeded lines (tricky)
  5. dispatch different types of labels
    • define primary label with different marker
    • enumerator and color selector for different levels of annotation
  6. Print the source name with line and column / range info

After this step I got the full printout from above, without the first error line. Defining the type which holds actual diagnostic was straight-forward because everything is already in place, also subdiagnostics come kind of natural with this setup and render without additional effort.

There are a couple of things I had to look out for:

  • character(len=:), allocatable means strings can be absent, almost every part of the diagnostic type is actually optional, except for the level enumerator
  • functional constructors for building output strings seem to be the only way to stay sane for this task
  • the diagnostic should not own the text / strings it is annotating

I don’t have suggestions like rustc yet, those seem to be part of the diagnostic and not the label. Also, I can’t report error stacks over multiple files (which was what I tried in my first attempt, maybe too ambitious).

3 Likes

I followed the data structure style that rustc uses, although I did simplify them a bit. Your data structures are here:

pretty-diagnostics/type.f90 at 071792f468a014c5f054f6afa091c06f42ac81a2 · awvwgk/pretty-diagnostics · GitHub

It seems quite similar. Can you summarize the differences?

Yes, the workflow you described is how LFortran does it too. The hard part is once you start mixing different labels on the same line and multiple lines and ensuring they don’t cross. I’ve only implemented some basics so that it prints, but Rust does some nice heuristics to make it look good.

Would it be possible to adopt these rustc alike warnings and errors in fpm? I know this is a lot of work (maybe we could use LFortran for this), and produces overhead. We could pass warning flags directly to some kind of “dry run” of LFortran which doesn’t produce any binaries but instead just checks the syntax of the code. Another option to invoke this would be to add an dedicated command to fpm, something like fpm check. Of course this would have to be optional, but I could imagine that this would make a nice feature, if I for example have to use ifort or nvfortran but want to have pretty error messages.
I don’t know if this idea is realistic, I think @certik has the most insight regarding this.
If I remember correctly one topic of the monthly call of October was a language server for Fortran. Maybe fpm could somehow use a the language server to produce such output?

1 Like

Once LFortran can reliably compile any (or most) Fortran codes, then one can use it just to check the semantics, but use other Fortran compilers to actually compile. LFortran is actually very fast, so this might be doable. I am hoping all compilers will eventually implement Rust style error reporting. It really is the way to go.

1 Like

I’m guessing what you had in mind primarily were the human-readable mark-up formats like TOML, JSON, XML, YAML, and various custom-formatted input files used in science & engineering codes (things like CFD meshes, molecular structure files, etc)?

Other things that come to mind are things like pkg-config, mathematical expression parsers, command line input parsers, and other OS data-formats/languages that follow some type of defined grammar. E.g. Python has built-in modules supporting INI configuration files, netrc files, Apple .plist files.

1 Like

I’m finally able to put those diagnostics to good use. Here are a few examples of the new error messages produced from my computational chemistry IO library (fpm compatible):

  1. DFTB+ genFormat file with unknown boundary conditions
Error: Invalid input version found
 --> geom.gen:1:4
  |
1 | 2  X
  |    ^ unknown identifier
  1. Vasp POSCAR file with switched lines
Error: Cannot read scaling factor
 --> POSCAR:2:1-2
  |
2 | Ti  O 
  | ^^ expected real value
  |
  1. FHI-aims geometry input with malformatted translation vectors
Error: Cannot read lattice vectors
 --> geometry.in:3:31-37
  |
3 | lattice_vector     0.00000    *******    2.95812
  |                               ^^^^^^^ expected real value
  |
  1. Turbomole coordinate file with redundant translation vector specification
Error: Conflicting lattice and cell groups
  --> coord:37:1-5
   |
35 | $lattice angs
   | -------- lattice first defined here
   :
37 | $cell angs
   | ^^^^^ conflicting cell group
   |

There are a lot more examples in the test suite, since it is fpm enables you can just run it yourself with fpm test if you want to see more.

The projects that invented those geometry formats are all in Fortran, maybe it is possible to contribute this kind of error message support to the respective upstream projects at some point.

4 Likes