Exploring first class error messages for Fortran

awvwgk · November 23, 2021, 10:28pm

Ondřej @certik recently reworked the error reporting in LFortran to be more in line with rustc. I can’t find whether we had a thread in discourse on this, but here is the relevant issue at the repo: #600.

I was wondering if there is interest to build the tools for rustc like error messages and diagnostic reports in Fortran and maybe even adopt it for all the major I/O libraries we are developing in the Fortran community? I started exploring an implementation of the rustc Diagnostic struct for toml-f in GitHub - awvwgk/pretty-diagnostics: Tools to create pretty diagnostic output.

Here is an example report for a duplicated key error created from a stub example (here without color):

error: duplicated key 'title' found
  --> example.toml:19:2-6
   |
 1 | # This is a TOML document.
 2 | title = "TOML Example"
   | ----- first defined here
 3 | [owner]
   :
18 |   dc = "eqdc10"
19 | [title]
   |  ^^^^^ table 'title' redefined here
20 | data = [ ["gamma", "delta"], [1, 2] ]
   |

Good tools alone don’t make good error messages, of course. It takes some effort to improve the diagnostic of an I/O library, but it might be worth to take the extra mile. What do you think?

certik · November 23, 2021, 10:58pm

Yes. The only issue is that I need it in C++. The diagnostic messages are used from the parser, to semantics, to the code generation backend.

One issue is what semantic information to use to represent it, here it is in LFortran:

src/lfortran/diagnostics.h · 53ded5898225dc9279ae70a2e2537d5cd3a3d1e0 · lfortran / lfortran · GitLab

But the hardest part actually is the rendering:

src/lfortran/diagnostics.cpp · 53ded5898225dc9279ae70a2e2537d5cd3a3d1e0 · lfortran / lfortran · GitLab

One has to render multiple primary and secondary error markers and multiple parts of the error message (parent + children) all in one source code listing. This part is the toughest. If we can collaborate on that, that would be very helpful.

And yes, I am very happy with the Rust style error messages with multiple error and warning labels. Here are some examples:

Add Rust style error messages (!1490) · Merge requests · lfortran / lfortran · GitLab

Here is another one:

The only way to improve those messages is to actually open an editor and highlight this in it, or some kind of a “less” (unix utility) like environment that you cannot edit, but can move around. One can also explore all kinds of interactive error reporting, where you can use arrow keys to move around or your mouse in VSCode and it shows all kinds of information about the error. But in a non-interactive way, the above is the best I can think of.

It might be good to also add 2 or 3 context lines above and below, like in “git diff”.

everythingfunctional · November 23, 2021, 10:58pm

I don’t think the value of good error messages can be overstated. It would be great to explore a way to make it easier for developers to produce better error messages.

awvwgk · November 23, 2021, 11:28pm

I did look into LFortran’s implementation initially, but found the way the diagnostics are stored difficult to unwrap in the rendering step. Getting the data structure right to actually make the rendering easy was tricky, my first attempt was starting top-down to produce an error report from a diagnostic, but this failed horribly.

The linked repo is my second attempt, if you follow the examples you can see the way I’m building the data structures and the rendering up. The strategies for implementing this in C++ or Fortran are not so much different, I can share my strategy if you want to steal an idea or two:

easy to disable color support
get a multi-line printout of any input with line numbering
add a single label to a line with arbitrary context lines (mind beginning and end!)
- defines the label type and implements most of the actual inline annotations
extend the render step to insert multiple labels
- deal with multiple labels on a single line
- ensure correct order of source line context (almost automatic)
- skip unneeded lines (tricky)
dispatch different types of labels
- define primary label with different marker
- enumerator and color selector for different levels of annotation
Print the source name with line and column / range info

After this step I got the full printout from above, without the first error line. Defining the type which holds actual diagnostic was straight-forward because everything is already in place, also subdiagnostics come kind of natural with this setup and render without additional effort.

There are a couple of things I had to look out for:

character(len=:), allocatable means strings can be absent, almost every part of the diagnostic type is actually optional, except for the level enumerator
functional constructors for building output strings seem to be the only way to stay sane for this task
the diagnostic should not own the text / strings it is annotating

I don’t have suggestions like rustc yet, those seem to be part of the diagnostic and not the label. Also, I can’t report error stacks over multiple files (which was what I tried in my first attempt, maybe too ambitious).

certik · November 24, 2021, 2:18am

I followed the data structure style that rustc uses, although I did simplify them a bit. Your data structures are here:

pretty-diagnostics/type.f90 at 071792f468a014c5f054f6afa091c06f42ac81a2 · awvwgk/pretty-diagnostics · GitHub

It seems quite similar. Can you summarize the differences?

Yes, the workflow you described is how LFortran does it too. The hard part is once you start mixing different labels on the same line and multiple lines and ensuring they don’t cross. I’ve only implemented some basics so that it prints, but Rust does some nice heuristics to make it look good.

Carltoffel · November 24, 2021, 1:23pm

Would it be possible to adopt these rustc alike warnings and errors in fpm? I know this is a lot of work (maybe we could use LFortran for this), and produces overhead. We could pass warning flags directly to some kind of “dry run” of LFortran which doesn’t produce any binaries but instead just checks the syntax of the code. Another option to invoke this would be to add an dedicated command to fpm, something like fpm check. Of course this would have to be optional, but I could imagine that this would make a nice feature, if I for example have to use ifort or nvfortran but want to have pretty error messages.
I don’t know if this idea is realistic, I think @certik has the most insight regarding this.
If I remember correctly one topic of the monthly call of October was a language server for Fortran. Maybe fpm could somehow use a the language server to produce such output?

certik · November 24, 2021, 2:38pm

Once LFortran can reliably compile any (or most) Fortran codes, then one can use it just to check the semantics, but use other Fortran compilers to actually compile. LFortran is actually very fast, so this might be doable. I am hoping all compilers will eventually implement Rust style error reporting. It really is the way to go.

ivanpribec · November 24, 2021, 3:29pm

I’m guessing what you had in mind primarily were the human-readable mark-up formats like TOML, JSON, XML, YAML, and various custom-formatted input files used in science & engineering codes (things like CFD meshes, molecular structure files, etc)?

Other things that come to mind are things like pkg-config, mathematical expression parsers, command line input parsers, and other OS data-formats/languages that follow some type of defined grammar. E.g. Python has built-in modules supporting INI configuration files, netrc files, Apple .plist files.

awvwgk · January 21, 2022, 6:55pm

I’m finally able to put those diagnostics to good use. Here are a few examples of the new error messages produced from my computational chemistry IO library (fpm compatible):

DFTB+ genFormat file with unknown boundary conditions

Error: Invalid input version found
 --> geom.gen:1:4
  |
1 | 2  X
  |    ^ unknown identifier

Vasp POSCAR file with switched lines

Error: Cannot read scaling factor
 --> POSCAR:2:1-2
  |
2 | Ti  O 
  | ^^ expected real value
  |

FHI-aims geometry input with malformatted translation vectors

Error: Cannot read lattice vectors
 --> geometry.in:3:31-37
  |
3 | lattice_vector     0.00000    *******    2.95812
  |                               ^^^^^^^ expected real value
  |

Turbomole coordinate file with redundant translation vector specification

Error: Conflicting lattice and cell groups
  --> coord:37:1-5
   |
35 | $lattice angs
   | -------- lattice first defined here
   :
37 | $cell angs
   | ^^^^^ conflicting cell group
   |

There are a lot more examples in the test suite, since it is fpm enables you can just run it yourself with fpm test if you want to see more.

The projects that invented those geometry formats are all in Fortran, maybe it is possible to contribute this kind of error message support to the respective upstream projects at some point.

Topic		Replies	Views
Fortran error handling including stacktrace generation Announcements	8	1645	November 7, 2022
Writing a linter in Fortran Tutorials	4	1375	July 10, 2022
Compiler error messages: show just the first, or all of them? Poll	24	2287	March 11, 2024
Testing LFortran’s parser and formatter (Round 2)	24	1765	March 6, 2023
ErrorFx: Fortran exception library Announcements	30	1451	October 19, 2021

Exploring first class error messages for Fortran

Related topics