ANTLR Grammars for Fortran

I found these grammars for the ANTLR parser generator (link above). If you’re wondering what ANTLR is, here is an excerpt from the preface of the The Definitive ANTLR 4 Reference:

ANTLR v4 is a powerful parser generator that you can use to read, process,
execute, or translate structured text such as program source code, data, and
configuration files. That means you can use it to build all sorts of useful tools
like legacy code converters, domain-specific language interpreters, Java to
C# translators, wiki renderers, XML/HTML parsers, DNA pattern recognizers,
bytecode assemblers, language pretty printers, and of course full compilers.

In the short biography of the author of ANTLR, Terence Parr, it says

[…] Terence holds a Ph.D. in computer engineering from Purdue University and was a postdoctoral fellow at the Army High-Performance Computing Research Center at the University of Minnesota, where he built parallelizing FORTRAN source-to-source translators.

According to the ANTLR FAQ’s

What do you think are the problems people will try to solve with ANTLR4?

In my experience, almost no one uses parser generators to build commercial compilers. So, people are using ANTLR for their everyday work, building everything from configuration files to little scripting languages.

Interesting, but I have some trouble getting a clear picture of this tool. Would it be possible to get Fortran code out of it that parses a particular kind of input? I did not find information about the programming languages it can generate parsers in, probably due to the limited amount of time I spent on that question :innocent:

Not Fortran code, but other languages including Java, C#, Python, JavaScript, TypeScript, Go, C++, Swift, PHP and Dart. I suppose C++ and maybe Go would be the easiest to interface from Fortran, if you really wanted to.

For your “particular kind of input” you need to be able to define the grammar according to the ANTLR grammar language. For instance a simple grammar to define a valid Fortran name:

/* FortranName.g4 */ 
grammar FortranName;

/* Parser Rules */

name : NAME ;

/* Lexer Rules */

fragment LETTER : [a-zA-Z] ;
fragment UNDERSCORE : '_' ;
fragment DIGIT : [0-9] ;

fragment ALPHANUMERIC : (LETTER | DIGIT | UNDERSCORE) ;

NAME : LETTER (ALPHANUMERIC)*;

Given the grammar file, ANTLR can generate a lexer and a parser (this is a trivial example, because there is nothing really worth parsing),

$ antlr4 -Dlanguage=Cpp FortranName.g4

This produces a bunch of .cpp and .h files, which can be used to build some new tool (say a compiler, syntax checker, …). You can also include predicates and other actions in the rules if needed. For instance the maximum length of a Fortran name is 63 characters, which could be checked with a predicate of the form {getText().length() <= 63}?.

ANTLR is not the only such tool, there are others like the classic Lex and Yacc. Another one is BNFC. ANTLR is meant to be friendlier than other tools.

Yes, I am more familiar with yacc and lex. And with regular expressions. I have come across ANTLR before, so I was triggered by it appearing here. It might be a useful tool for dealing with some of the modern file formats. Anyway, it has got me thinking again about parsing in general.

The grammars in the grammars-v4 repo for Fortran77 and Fortran90 are a little old but should be ok. Either of the current Fortran grammars should work in a pinch in an LSP server. The Trash Toolkit can generate a client/server for VSCode from an Antlr grammar (Trash/src/trgenvsc at main · kaby76/Trash · GitHub), with settings for syntactic highlighting. But the settings can be adjusted for semantic highlighting based on XPath patterns. I am currently working on a new Antlr grammar for ISO Fortran 2023. It is scraped and refactored directly from the PDF of the Spec. GitHub - kaby76/fortran. It may be some time before it is complete because the grammar in the Spec is ambiguous. Antlr does not have a Fortran target, i.e., it doesn’t generate a parser in Fortran.

1 Like

The Flang developers have put together a grammar: Fortran 2018 Grammar — The Flang Compiler
F2023 introduced some new syntax (.nil., ternary if, @, …); I think the Intel compiler might already support a few minor elements of it.

One complication with Fortran is the existence of two source forms, free and fixed. While discouraged for new development, the fixed-form remains valid, and if you want to have complete tools you essentially need to have two separate parsers. Compilers decide which parser to use based upon the file extension, or if the user specifies a flag.

I believe GitHub, and maybe also this Discourse, use the following TextMate grammar: GitHub - textmate/fortran.tmbundle: TextMate support for Fortran. Most IDEs have their own regex-based parsers, which tend to be incomplete.

Thanks for the note. Yes, the Antlr4 grammars for Fortran in grammars-v4 are for parsing free form. I thought I wrote an Issue in Github to keep track of implementing this, but apparently not. I don’t plan to write an Antlr grammar for the other standards or compiler-specific extensions until after Fortran2023 is complete.