LFortran can now parse all of SciPy to AST

Just wanted to share a big milestone for LFortran, we can now parse all of SciPy’s Fortran code, that is in fixed-form, into AST (Abstract Syntax Tree): https://twitter.com/lfortranorg/status/1578453296183791618.

We have a free-form parser that seems quite complete for all of Fortran. But we now have a dedicated fixed-form tokenizer that is by itself a recursive descent parser, which parses a prescanned fixed-form code (white space removed, continuation character & handled) and skips over things that are not relevant (such as details of expressions), but it parses the whole code and emits correct tokens. The tokens then go into our existing Bison parser which is shared with the free-form tokenizer. Here are all the files:

The fixed-form tokenizer cannot handle all the corner cases yet, but it can handle all of SciPy. If you discover anything that does not work, please report an issue and we’ll fix it.

The free-form tokenizer as well as the (shared) Bison parser should be able to parse all of Fortran 2018.

What remains now is finishing up semantics and code generation, which we are focusing on now. We will post an update in a few weeks (we need some time to figure out and document what is missing and how long it will take us to finish it, but we are quite far, for example the semantics already works for all of minpack).

20 Likes

I’m hyping! :grin:

1 Like

Interesting comment on Twitter from a SciPy core maintainer about how this (LFortran C++ backend) could mean SciPy could “finally remove Fortran”. :man_facepalming:

3 Likes

@jacobwilliams here is my reply to that: https://twitter.com/lfortranorg/status/1578471409268903936

3 Likes

I’m so excited to see this materialise. LFortran is really coming into a generally functional compiler. Congratulations!

3 Likes

Thank you. And I have not forgot about your PRs, I was just busy with the parsing stuff, wanted to get it done.

I can now shift my attention to just semantics and harden the ASR (Abstract Semantic Representation) and the LLVM backend. If you or anyone else want to help, here is a minpack repository:

That branch has identical code to SciPy. It compiles to ASR already. We “just” need to fix some bugs in the LLVM backend and ensure everything links together and runs. This is not far from fully working!

Obviously after minpack there are still a few things we need to implement for SciPy (I know about common blocks and equivalence), but old original minpack will be a beautiful milestone.

2 Likes

There might be an obvious answer I’m missing, but if you don’t have semantics and codegen, how do you confirm that it’s parsing correctly? Manual inspection?

1 Like

Excellent question.

The short answer is that in general indeed you can only be sure everything is parsed correctly by doing semantics, then code generation and then running the test suite, but because the SciPy test suite also is not 100% comprehensive, the only way is to actually use LFortran by a lot of people for their projects and over time build confidence that things actually work. And then you can take let’s say just the parser and trust it. If you only write a parser but not a widely used compiler, it is generally hard to ensure there are no bugs.

That being said, we used our formatter lfortran fmt a.f90 to print the AST back to free-form source code, and then we used gfortran to compile all of it. That caught quite a few bugs. Then we wrote a wrapper “gfortran” that the SciPy Meson build system calls, and we “preprocess” using LFortran, convert the .f to .f90 and compile with gfortran and everything builds and links. There are some test failures that were not there before, so there are still some bugs, but because building and running the full SciPy test suite takes about 10 minutes, we have not investigated beyond that. We then tried this .f.f90 → gfortran on just the Minpack subset of SciPy, and that works fully and all SciPy tests work. Also the Bison parser is untouched, we just give it the tokens from a new fixed-form tokenizer, so the parser itself catches many bugs if we give it tokens in incorrect form. All in all, we’ve done everything that can be reasonably done, the rest we’ll catch and fix once we do the full compilation. Also, as mentioned above, the semantics works for Minpack, and even some (most) code generation, so the hard part of parsing is behind us and it either already works, or will be easy to fix.

5 Likes

I’m glad I asked! That’s really cool. (It also nicely parallels the work going on at GitHub - JuliaLang/JuliaSyntax.jl: A Julia frontend, written in Julia where we’re in progress replacing Julia’s parser which is currently written in a lisp, with one written in Julia).

1 Like

Great way to handle that, well done!

1 Like