Thanks @cmacmackin for the feedback. Here are my comments:
Yes, the prescanner can map the continuous stream to the correct line/column, but we have not yet hooked it into error reporting or a nice API that tools like FORD could use to get the correct info. We’ll get it done soon.
I have a couple thoughts on this.
When you point FORD to a single module that depends on other modules, in theory you would have to do semantic analysis on all other modules first, for example so that you know if dp (that was used) means double precision or something else. Now in practice, one can use all kinds of heuristics to avoid having to do semantic analysis on other modules, and just deal with a single module, and just do “educated guesses”, that dp probably means double precision (who would use it for single precision?), if you use a pure function that is used in a declaration to determine the size of the return array, you just guess the declaration from the way it is used and create the proper ASR nodes for it, on a best effort basis. Yes, this can fail in theory, but probably would work really well in practice.
How exactly do Flang’s symbol tables handle this?
In LFortran, the AST->ASR conversion goes in two passes: first a symbol table structure is created, then it is filled in with functions implementations (bodies). Most of our current work is on the bodies part, as most Fortran features concern those. For FORD, it seems you don’t care about the bodies, only about the symbol tables.
If this is the main blocker, what I could do is add a mode to LFortran that only deals with the symbol tables, but ignores function bodies. The ASR would have empty “bodies”, but FORD would not care about it. And then ensure all of Fortran works in this mode. I think we are actually quite close.
Regarding errors, such as invalid Fortran code — what would you like to happen: return ASR anyway, on a best effort basis, trying to recover from the errors?
I think the “symbol table only mode” might greatly help with errors: as long as the overall structure is ok, it would never do semantic analysis of procedure bodies, where (I assume) most of the semantic errors would be.
This would help with our Python wrapper backend also, I think we also only care about the “symbol table only mode”.
CC @hsnyder.
Yes. These are a bit tricky how to best parse and represent in AST, as those currently get thrown away by the prescanner. We have to figure out a good solution for this.
How do you plan to do this with Flang? My understanding is that Flang throws away all comments. That seems like a pretty big blocker.
Yes, we are planning to implement one also.
Right. Have you discovered any issues with parsing free form to AST? I am not aware of any bugs.
The fixed form has been lower priority, since we concentrated on modern Fortran first. I just talked with @ThirumalaiShaktivel today and we’ll write a proper tokenizer for fixed-form so that we can parse all of it. (We have been focusing on modern Fortran first.)
We will be happy to add more comments. I have written documentation how it works here a little bit:
- Page Redirection
- grammar/AST.asdl · 94bf06129614f418cbccbca71f1a07eafbb1d77b · lfortran / lfortran · GitLab
- grammar/ASR.asdl · 94bf06129614f418cbccbca71f1a07eafbb1d77b · lfortran / lfortran · GitLab
But it’s not as well detailed for developers yet, more of a general overview. The generated files are indeed C like because that was the best performance (of LFortran itself) that I was able to get (I tried C++ style inheritance as well as std::variant, etc.). However, as a developer, you do not touch the C like files, you write C++ style visitor pattern to operate on the AST or ASR, such as here:
You just add a method for each AST or ASR node that you want to visit. It seems as simple as it can get. You can consult the AST.asdl files to see what member variables each node contains.
We try to limit a lot of modern features as well as heavy use of templates (we do use some) in order to ensure the whole project compiles quickly with any C++ compiler. That is very important for a good developer experience.
If you see something that is just not well designed, please definitely let us know. I know that a lot of these decisions are more of a “taste”, and a choice of particular C++ development style and I get that. I personally like the LFortran simple direct style. But if you prefer the Flang’s C++ style, then you should use Flang.
However, if the above technical reasons are more important, and if they are fixed you would use LFortran, I will prioritize them to get them fixed soon.
Let me know.