Testing LFortran’s parser and formatter (Round 2)

Thank you everybody who tested the LFortran’s parser in Testing LFortran's parser and formatter. We have now fixed every bug that was reported, so we would appreciate round 2 of testing using the newly released version of LFortran 0.12.0. You can install it using Conda:

conda create -n test1 -c conda-forge lfortran=0.12.0
conda activate test1

Or you can compile from source. If you discover any bugs, please report them here or create an issue at: Issues · lfortran / lfortran · GitLab .

How to test:

lfortran fmt some_file.f90

If it prints the code back (in color), it works. If you get any kind of errors, it’s a bug in the compiler and we would appreciate it if you let us know about it. You can also replace the original file by lfortran fmt -i some_file.f90 (be careful, only do this if the original file is checked into git). If you compile the reformatted code with another Fortran compiler, it should compile and run correctly.

Besides bugs that we fixed, we also implemented:

  • comments, empty lines and semicolons into AST, so the formatter will now preserve those.
  • fixed form: this is only preliminary so far and still has bugs, but you can test it using lfortran fmt --fixed-form some_file.f

If something does not work, please let us know.

You can browse existing issues here:

5 Likes

You mean now instead of not?

Yes. Thanks for spotting it, I fixed it above.

Not sure if this is bug or not, but will post in case: I tried this on a file that has user-defined operators in it, and I get a syntax error because of this. An example would be this:

call ERROR_HANDLER%trigger(errors=.errors.rslt)

…which gives me the error:

input:467:43 syntax error: token type 'defined operator' is unexpected here
        call ERROR_HANDLER%trigger(errors=.errors.rslt)
                                          ^~~~~~~~
Parsing error: syntax error

Of course, I might just be doing something syntactically incorrect that GFortran/Intel doesn’t complain about! It’s always made me a bit nervous that I can get away with this syntax…

1 Like

Thanks! I think that is a bug. I am fixing it.

Update: fixed in !1206.

1 Like

One thing i notice. On Windows if the file has CRLF line endings, contiguous comments lines are getting an extra line break. So:

    integer :: i ! this is
                 ! a comment

becomes:

integer :: i ! this is

! a comment
1 Like

Thanks @jacobwilliams, I opened up lfortran#523 for this. I need to buy a Windows machine and do some more thorough testing there besides what our CI provides. We need to fix it.

1 Like

Speedy work! Cheers.

2 Likes

Using a kludged script (wget http://www.urbanjost.altervista.org/REMOVE/lfortran.sh) I found many cases where lines became over 132 characters, which I think I see mentioned in the known issues, but not quite sure it is talking about the same thing; even adding -ffree-line-length-none on the output still produced about 45 files mostly to do with requoting strings, issues with specifying KIND on intrinsics when using COMPLEX values, the block name being removed from named SELECT CASE statements at the end but not at the beginning. The script is a hack but I put it where someone can fetch it if they like. The little “lf” script it uses could be changed to also go through some fixed-form files; will take a while to sift though it but if someone wants to duplicate it I can explain the script as it would need customized to anyone’s environment and puts the K in Kluge. In the mean time I will try to reduce these down to real sample cases if I get a chance but at first glance I think the 45 is really probably about 10 issues. Since the reformatting is extensive enough a plain diff(1) is useless, I am thinking about cleaning up the script and having it use the error messages to try and find the original lines that were changed. I know line length limits are scheduled to be eliminated/greatly increased in the Fortran standard, but if you take off the switch so 132 columns is enforced a lot of lines will show up.

1 Like
 lfortran fmt xx.f90
subroutine bug()
NAME: select case (a)
end select
end subroutine bug

(test1) urbanjs@venus:~/github/_scratch$ cat xx.f90
subroutine bug()
NAME: select case(a)
end select NAME
end subroutine bug

So in this one notice the END SELECT no longer has a label.

1 Like
program bug
character(len=*),parameter :: chars(2)= [ '"   ', '''   ']
write(*,*)chars
end program bug

(test1) urbanjs@venus:~/github/_scratch$ lfortran fmt xx.f90
program bug
character(len=*), parameter :: chars(2) = ['"   ', "''   "]
write(*, *) chars
end program bug 

in this one it is changing single quote to double quote delimiters for strings but not taking into account that a single quote doubled inside of single quotes should just become a single quote if placed in double quotes.


There are some subtle things about complex numbers. I have to edit this post because Discourse is not allowing more than two posts without an answer in-between, so this is
a third one.

program bug
COMPLEX*16         C
DOUBLE PRECISION   ABS1
   !     .. Statement Function definitions ..
    ABS1( C ) = ABS( DBLE( C ) ) + ABS( DIMAG( C ) )
   write(*,*)abs1((10.0d0,20.0d0))
end program bug
program bug
complex(16) :: C
double precision :: ABS1 
!     .. Statement Function definitions ..
ABS1(C) = ABS(DBLE(C)) + ABS(DIMAG(C))
write(*, *) abs1((10.0d0, 20.0d0))
end program bug

besides the obvious issue, converting the non-standard COMPLEX*16 to COMPLEX(16) is not a sure bet

If these ones are fixed it would fix multiple cases that I found. Haven’t verified it yet, but if the issue with the label at the end of the case applies to other named structures and blocks that will be a bunch of the remaining ones.



As far as the comments now being included it would be nice if they were colored to distinguish them from the code. Typically I think blue is used for the comments.


Something more feasible for an actual AST interpretation of the code and a compiler put that a basic colorizer could not do would be to have an option to add INTENT and/or PURE and ELEMENTAL and maybe automatic interfaces; even if just routines with no externals for something down the road.

________________________________________________

function base2_fdo(x) result(str)
!  return string representing number as a binary number.  Fixed-length string: do concurrent
integer, intent(in) :: x
character(len=max(1,bit_size(x)-leadz(x))) :: str

integer :: n, i

    if (x == 0) then
      str(1:1) = '0'
      return
    endif
    n = len(str)
    str = repeat('0',n)
    do concurrent (i = 0:n-1, btest(x,i))
      str(n-i:n-i) = '1'
    end do
end function base2_fdo
function base2_fdo(x) result(str) 

!  return string representing number as a binary number.  Fixed-length string: do concurrent
integer, intent(in) :: x
character(len=max(1, bit_size(x) - leadz(x))) :: str 

integer :: n, i 

if (x == 0) then
    str(1:1) = "0"
    return
end if
n = len(str)
str = repeat("0", n)
do concurrent (i = 0:n - 1)btest(x, i)
    str(n - i:n - i) = "1"
end do
end function base2_fdo

----------------------------------------------------------------------------------------

just to confirm the ending label is removed from IF and not just SELECT

program bug
LABEL: if(0.eq.0)then
endif LABEL
end program bug
program bug
LABEL: if (0 == 0) then
end if
end program bug

This may be related to the previous issue with quotes, but is distinct enough to mention it

program bug
use,intrinsic ::  iso_c_binding
implicit none
integer(C_INT),parameter ::  KEY_C3        = INT(O'540',C_INT) ! lower right of keypad
end program bug
program bug
use, intrinsic :: iso_c_binding
implicit none
integer(C_INT), parameter :: KEY_C3 = INT("O'540'", C_INT) ! lower right of keypad
end program bug

_______________________________________________

So that is from over 690 000, source lines. I do not currently have time available for more but pretty sure that is over 90/100 of the issues I could find so that is very encouraging. A compiler vendor gave me a “Hulk Smash” tee-shirt once for having the most bug submittals from any customer, makes me feel like I still have the touch :> So from experience this is looking exciting, I hope this list does not appear to come across negatively.

1 Like

Is it expected to fail in codes that use preprocessing, like in this example?

The code here enables optional timing via the preprocessor variables TIMER_START and TIMER_STOP, which are defined as:


! Compile with -DTIMER to add timing to the code 
#ifdef TIMER
#   define TIMER_START(tname) call domain%timer%timer_start(tname)
#   define TIMER_STOP(tname)  call domain%timer%timer_end(tname)
#else
#   define TIMER_START(tname)
#   define TIMER_STOP(tname)
#endif

Then in the code itself, there are lines like this:


TIMER_START('fluxes')

call domain%compute_fluxes()

TIMER_STOP('fluxes')

to time various parts of the code within each domain.

It compiles with ifort, gfortran, nvfortran, so long as we invoke the preprocessor (e.g. with the -cpp option for gfortran).

Using lfortran fmt on the original file I get errors like this:

>  lfortran fmt domain_mod.f90 
input:556:30 syntax error: token type 'newline' is unexpected here
TIMER_START('printing_stats')
                             ^
Parsing error: syntax error
1 Like

It doesn’t support preprocessor usage yet, as far as I know. See

@gareth you have to pre-process it first:

$ gfortran -E -cpp a.f90 -o b.f90
$ lfortran fmt b.f90             

The LFortran’s tokenizer currently ignores preprocessing directives. As @awvwgk wrote, we have an open issue for that.

I have some ideas how to incorporate formatting of the original code. If you want to help us design it, we are always interested.

The best idea so far I have is this:

  • We implement the C preprocessor int the prescanning phase and map each character of the result into the original file
  • We parse to AST, each AST node has location information into the preprocessed file
  • We format the result using lfortran fmt (effectively), but for each AST node that gets formatted, we determine if the AST node comes from a macro expansion or not:
    • If it fully does not, we simply write it out
    • If it fully or partially comes from a macro expansion, we instead output the original code

If the macro expands to just nothing, we’ll have to figure out how to insert it back

It’s quite non-trivial, but I think there might be a way.

The alternative is to write a completely new tool, I believe that’s how clang-format does it, but I don’t want to do that, because I want to reuse as much code as possible to take advantage of the fact that LFortran’s parser will always work, since it is part of the compiler.

If I understand correctly, that’s what LLVM Flang does. If you could share the implementation, or even use it as a starting point, it might save you some work. More importantly, it would mean both compilers support the same preprocessing features.

It looks like they have put a lot of thought into preprocessing in Flang and I hope you implement the same preprocessing language. That would be a step toward a de facto standard (The preprocessor should be standardized · Issue #65 · j3-fortran/fortran_proposals · GitHub).

Note that we are talking about formatting here. Flang cannot return the original code with macros nicely formatted, as far as I know.

What you are talking is to just support a pre-processor for compilation and LFortran will support a pre-processor for compiling.

It is not hard to support a preprocessor. What is hard is to generate the original code back from AST, and “undo” preprocessing.

No, but it has code that can preprocess Fortran source and map each character of the output back to the location in the source file it came from. That’s what you listed as the first step to formatting source that has preprocessing directives.

1 Like

It seems the formatter can’t recognize end associate and type-bound procedures.

The original code

module test
implicit none

type :: example_type
real :: x, y
contains
procedure :: type_bound_func
generic :: type_bound => type_bound_func
end type example_type

contains

!> type bound function
function type_bound_func(this) result(ret)
class(example_type), intent(in) :: this
real :: ret
associate (x => this%x, y => this%y)
    ret = sqrt(sin(x)**2 + cos(y)**2)
end associate
end function type_bound_func

end module test

Afterwards

With lfortran fmt --indent-unit test.f90

$ lfortran fmt --indent-unit test.f90
module test
    implicit none

    type :: example_type
        real :: x, y

contains

procedure :: type_bound_func
generic :: type_bound => type_bound_func
    end type example_type


contains

    function type_bound_func(this) result(ret)
        class(example_type), intent(in) :: this
        real :: ret
        associate (x => this%x, y => this%y)
            ret = sqrt(sin(x)**2 + cos(y)**2)
end associate
    end function type_bound_func


end module test

I am not sure if it is a bug so I post it here.

1 Like

It seems it’s indented incorrectly right? That’s definitely a bug, thanks for reporting it!

1 Like

Yes. LFortran also has code to do prescanning and mapping characters.

One issue is that the Flang prescanning throws away comments (based on my understanding), and we keep comments in LFortran, because they are part of the AST (and that’s why they also appear in the reformatted file). We also keep spaces and new lines, I don’t know if the Flang’s version does that.

This code is not long, so it’s probably not worth the hassle to figure out how to design code that can be used by both projects.

We could collaborate on the preprocessor if there is interest, assuming it could be designed in a way so that one can use it independently.