Writing a linter in Fortran

I just finished a rewrite of the parser infrastructure in TOML Fortran, which allows reporting of error messages with rich context information: Improve lexing and parsing of TOML documents by awvwgk · Pull Request #88 · toml-f/toml-f · GitHub

For this new feature I decided to add a tutorial on writing a linter for TOML configuration files, in this case an fpm plugin to lint the package manifest.

The tutorial is available at: Building a linter — TOML Fortran

Feedback is very welcome, feel free to comment here or at the issue tracker.

A more concise recipe section for the error reporting is available at Reporting errors — TOML Fortran

9 Likes

That is amazing work @awvwgk and very helpful for people just getting started with toml-f and fpm (when it merges). It is a very interesting approach, using Fortran to implement a linter, instead of Javascript or some language that is slightly more feature-rich when it comes to string manipulations. Although ultimately it is the only solution that makes sense, since you don’t want to burden users with having to download node.js to use the linter.

I had a few suggestions and questions that you might want to consider.

  • Adding an option or slightly restructuring the diagnostic message to be easily parseable via regex. The message structure is already very good, but fetching the error message via regex would be hard. A good example of that is gfortran-11>= with the flag -fdiagnostic-plain-output. Having an such option would allow for code editors to parse the output of the linter to their Diagnostics console in VS Code PROBLEMS tab.
  • The other thing I would consider producing is a schema, if at all possible! Currently this can be done for TOML through JSON Schema Everywhere: Schema Validation for TOML | JSON Schema Everywhere. This would allow for passive linting of the toml file, from the code editor in addition to the existing solution. Again, specifically to VS Code, although that is feature exists in all other widely used editors (neovim, emacs), enabling that schema would look something like this and additional info here. AFAIK, TOML validation is only possible through JSON but it’s been a while since I last looked at ways of doing it.

From my understanding toml-f allows for an “extended” version of TOML syntax?? Or am I wrong in thinking that toml-f.git does not follow standard naming conventions?

2 Likes

Thanks, I believe Fortran is a very expressive language when it comes to high-level programming. With the right abstraction, even string processing can become beautifully simple in Fortran. In fact the motivation for this tutorial was to show how effortless it is to create high-quality terminal output with the new context object and a linter is a perfect application for creating many of such outputs.

The format chosen is inspired by the Rust compiler, I think it does a great job to present the message for humans and I don’t see a good reason to change it. The intermediary diagnostic object however is meant for machine consumption and could be turned into something which is easy to parse for an external plugin.

That’s not really in scope of this tutorial, but on scope for TOML Fortran for sure. If there is a preferred format to dump such diagnostics, let me know and we can figure out how to integrate this with the context object.

Indeed, schema support would be useful. But this is certainly out-of-scope for TOML Fortran since the library only provides a low level interface to the TOML data structures as well as a set of convenience wrappers to manipulate the structures (get_value and set_value, I tend to call those build interfaces).

Schema support should be provided by a separate project building on top of TOML Fortran, especially if JSON Schema is consumed for this purpose we would first need a JSON Schema implementation in Fortran. I started writing something like this a while ago, but didn’t get around to finish it yet.

TOML Fortran is compliant to the TOML standard version 1.0.0, except for the UTF8 support, so no extensions to the format at all. toml-f.git is a dotted key, which implicitly creates the table toml-f to insert the git key. Personally I think it is more expressive than using an inline table (curly braces).

I was actually thinking about adding a linter pass to flag inline tables with a single key to replace them with a dotted key, but I couldn’t think of a good highlighting for this case yet. Maybe something like this:

❯ fpm run -- fpm.toml
info: Inline table with single key should use dotted key
 --> fpm.toml:4:10-51
  |
4 | toml-f = {git = "https://github.com/toml-f/toml-f"}
  |          ^--- use dotted key instead              ^
  |
1 Like

+1

And the amazing and the distressing situation with Fortran is with just a handful of additional features in the language standard and the compiler support toward such features, how much better can be Fortran for computing generally, whether it is string manipulations or any manner of data processing! This can open up a variety of domains again for Fortran with more and more rich and powerful libraries arising in Fortran as a result.

3 Likes

To be honest I thought it was likely my comments would be slightly out-of-scope or tangential to the toml-f tutorial but I wasn’t sure if you were also interested in discussing the linter and its future uses here or in another post.

Any thoughts I have on the schemas I will write them on the Schema Support thread

I was using a quite outdated TOML validation tool v0.4: https://www.toml-lint.com/, apologies, my bad!

I think your proposed solution works, at least I find it reasonable. I would go as far as not “permit” (if enabled in the linter settings) the usage of inline {}, basically enforcing a formatting style that is human friendly. In Javascript that is quite common with husky and pre-commit hooks, you run your linter before each commit and it automatically applies fixes to your files.
Again, I don’t think that this should be within the scope of the tutorial.

I think that the current format is perfectly fine and easily parsable via regex, e.g.

error: Entry in 'package-name' must be boolean
 --> fpm.toml:4:16-21
  |
4 | package-name = "true"
  |                ^^^^^^ expected boolean value
  |

you can easily capture the “error: Entry in ‘package-name’ must be boolean” and that is good enough, so ignore my comment on this, I was mistaken.

I am happy to start another thread (if here is not the place) about improving the linter for general purpose usage, editor integration and wide scale adoption.

1 Like