LFortran can now parse all of SciPy to AST

certik · October 7, 2022, 7:39pm

Just wanted to share a big milestone for LFortran, we can now parse all of SciPy’s Fortran code, that is in fixed-form, into AST (Abstract Syntax Tree): https://twitter.com/lfortranorg/status/1578453296183791618.

We have a free-form parser that seems quite complete for all of Fortran. But we now have a dedicated fixed-form tokenizer that is by itself a recursive descent parser, which parses a prescanned fixed-form code (white space removed, continuation character & handled) and skips over things that are not relevant (such as details of expressions), but it parses the whole code and emits correct tokens. The tokens then go into our existing Bison parser which is shared with the free-form tokenizer. Here are all the files:

Free-form tokenizer: lfortran/tokenizer.re at 5e2d4dc4ba7d70ecae9c54a7d1b4099b510553be · lfortran/lfortran · GitHub
Fixed-form tokenizer: lfortran/fixedform_tokenizer.cpp at 5e2d4dc4ba7d70ecae9c54a7d1b4099b510553be · lfortran/lfortran · GitHub
Bison parser: lfortran/parser.yy at 5e2d4dc4ba7d70ecae9c54a7d1b4099b510553be · lfortran/lfortran · GitHub

The fixed-form tokenizer cannot handle all the corner cases yet, but it can handle all of SciPy. If you discover anything that does not work, please report an issue and we’ll fix it.

The free-form tokenizer as well as the (shared) Bison parser should be able to parse all of Fortran 2018.

What remains now is finishing up semantics and code generation, which we are focusing on now. We will post an update in a few weeks (we need some time to figure out and document what is missing and how long it will take us to finish it, but we are quite far, for example the semantics already works for all of minpack).

rgba · October 7, 2022, 10:20pm

I’m hyping!

jacobwilliams · October 7, 2022, 10:49pm

Interesting comment on Twitter from a SciPy core maintainer about how this (LFortran C++ backend) could mean SciPy could “finally remove Fortran”.

certik · October 7, 2022, 10:58pm

@jacobwilliams here is my reply to that: https://twitter.com/lfortranorg/status/1578471409268903936

gnikit · October 7, 2022, 11:04pm

I’m so excited to see this materialise. LFortran is really coming into a generally functional compiler. Congratulations!

certik · October 7, 2022, 11:17pm

Thank you. And I have not forgot about your PRs, I was just busy with the parsing stuff, wanted to get it done.

I can now shift my attention to just semantics and harden the ASR (Abstract Semantic Representation) and the LLVM backend. If you or anyone else want to help, here is a minpack repository:

That branch has identical code to SciPy. It compiles to ASR already. We “just” need to fix some bugs in the LLVM backend and ensure everything links together and runs. This is not far from fully working!

Obviously after minpack there are still a few things we need to implement for SciPy (I know about common blocks and equivalence), but old original minpack will be a beautiful milestone.

oscardssmith · October 7, 2022, 11:54pm

There might be an obvious answer I’m missing, but if you don’t have semantics and codegen, how do you confirm that it’s parsing correctly? Manual inspection?

certik · October 8, 2022, 12:27am

Excellent question.

The short answer is that in general indeed you can only be sure everything is parsed correctly by doing semantics, then code generation and then running the test suite, but because the SciPy test suite also is not 100% comprehensive, the only way is to actually use LFortran by a lot of people for their projects and over time build confidence that things actually work. And then you can take let’s say just the parser and trust it. If you only write a parser but not a widely used compiler, it is generally hard to ensure there are no bugs.

That being said, we used our formatter lfortran fmt a.f90 to print the AST back to free-form source code, and then we used gfortran to compile all of it. That caught quite a few bugs. Then we wrote a wrapper “gfortran” that the SciPy Meson build system calls, and we “preprocess” using LFortran, convert the .f to .f90 and compile with gfortran and everything builds and links. There are some test failures that were not there before, so there are still some bugs, but because building and running the full SciPy test suite takes about 10 minutes, we have not investigated beyond that. We then tried this .f → .f90 → gfortran on just the Minpack subset of SciPy, and that works fully and all SciPy tests work. Also the Bison parser is untouched, we just give it the tokens from a new fixed-form tokenizer, so the parser itself catches many bugs if we give it tokens in incorrect form. All in all, we’ve done everything that can be reasonably done, the rest we’ll catch and fix once we do the full compilation. Also, as mentioned above, the semantics works for Minpack, and even some (most) code generation, so the hard part of parsing is behind us and it either already works, or will be easy to fix.

oscardssmith · October 8, 2022, 12:31am

I’m glad I asked! That’s really cool. (It also nicely parallels the work going on at GitHub - JuliaLang/JuliaSyntax.jl: A Julia frontend, written in Julia where we’re in progress replacing Julia’s parser which is currently written in a lisp, with one written in Julia).

rgba · October 8, 2022, 1:07am

Great way to handle that, well done!

Topic		Replies	Views
Testing LFortran’s parser and formatter (Round 2)	24	1767	March 6, 2023
Let's get LFortran to compile Minpack	8	2244	January 30, 2023
Updates for Fortran-lang/LFortran for the J3 Fortran committee	4	546	October 17, 2022
Looking for parsers for fortran77 Help	8	969	October 2, 2024
Testing LFortran's parser and formatter	18	1367	August 16, 2021

LFortran can now parse all of SciPy to AST

Related topics