How to contribute to LFortran's runtime library

Suppose one would like to contribute. Where to start? I cannot see any “ToDo” document. I looked at impure/lfortran_intrinsic_math.f90 - there are a few dotproductXXXX functions with no body - is this to be done? Also, many math functions there are just wrappers to C functions - where is that C code placed?

2 Likes

Thanks for your interest @msz59 !

The C file is here:

lfortran/lfortran_intrinsic_sin_c.c at 6cb62b1e0e96c6e4c0ba6ddb201bfc4e5c1ab670 · lfortran/lfortran · GitHub

It is being called from the Fortran files. We implement things in C that require calling into system functions and we also currently use libm for math functions implementations. We have a prototype pure Fortran implementation of sin(x) and we should do the other functions too and add a compiler option to choose between libm and pure Fortran.

There is a wiki page with the current status of intrinsic function implementation: Implementation Status of Intrinsic Functions · Wiki · lfortran / lfortran · GitLab

Beyond that, I am happy to answer any questions.

3 Likes

Thanks for making this a separate thread. And for explanations.

Still, I am confused. The lfortran_intrinsic_sin_c.c file that you mention includes only a few functions computing two-over-pi. The impure/xxx_math.f90 that I asked about contains a lot of wrappers to C functions using bind(c, name="_lfortran_XXX"). I could not find any of those lfortran_XXX functions

1 Like

My bad, I pointed you to a wrong file. This is the correct C file: lfortran/lfortran_intrinsics.c at 6cb62b1e0e96c6e4c0ba6ddb201bfc4e5c1ab670 · lfortran/lfortran · GitHub, it’s in the src/libasr directory because the file is shared with LPython.

1 Like

Sorry for naive questions but I know nothing about writing compilers. Maybe it should move to private communication but I thought maybe somebody else would like to try and encounters similar doubts/questions.

  1. What is the general idea - to write the runtime preferably in pure Fortran or to write fast code, possibly using C? From what I see on first glance, there is a mix of these. EXP(), LOG() and many others are implemented as wrappers to C library but, to my surprise, SQRT() (except complex) is written as x**(1.0_sp/2) which takes 8-15 times longer than sqrt implemented in gfortran-12 on an Intel i5-3330S machine depending on options used.
  2. What is/are the planned target architectures for LFortran? Can one assume e.g. IEEE FP number model/arithmetic?
  3. In the Implementation Status table, what is the difference between green tick and red cross? The explanation of having/not having llvm in integration_tests/CMakeLists.txt is a black magic to me. Is there anything to do for a runtime contributor for the latter group (red crosses)? If I notice properly, the Basic Impl(ementation) column has a green tick if at least some (but not necesserily all) tests have green ticks.
  4. If the column Tested in, llvm status is empty, does it mean that one has to write test suite for that intrinsic? What is the notion of comprehensive tests?
2 Likes

Thanks for the questions @msz59. Here are my answers.

  1. The idea is to write the fastest code possible. Since LFortran can generate such a code (as fast or faster than other C compilers), we can do this in Fortran and gain those speed advantages. And using Fortran has the advantages that other Fortran programmers can help with it. If x**(1.0_sp/2) currently takes longer than sqrt in libc, then we have to optimize it (we have to do this optimization anyway). Go ahead and report an issue with your benchmark codes and we can work on that. If you think it is not possible to do such an optimization then we should discuss and implement it differently.

  2. We will have two modes, a Debug and Release mode. In Debug mode you can assume IEEE FP and we have to be careful to do things properly so that programs do not break. The Release mode options will allow to relax some of the requirements to gain speed, and users can choose to use those if it works for them, and will not use those if it doesn’t work for them.

  3. You would have to read the compiler sources and see the actual latest status. If you want to help, you can simply find a function that doesn’t work yet and work on the implementation. I am happy to meet over video to get you started. You can also help us maintain the wiki page to be current, but it’s quite a lot of work and it seems the time is better spent actually implementing functions (definitely for me).

  4. If LLVM is empty, then we do not have a test in integration_tests for this function, thus not being tested with LLVM.

I recommend you go here: lfortran/integration_tests at main · lfortran/lfortran · GitHub and see if you see a test for a given function. The test must be registered here: lfortran/CMakeLists.txt at main · lfortran/lfortran · GitHub and you must see the “llvm” flag. If you do, then this test is compiled to machine code (via the LLVM backend) and run and it passes (on our CI). Those are the ultimate tests. So if you don’t see it there, you can go ahead and implement those functions. I can help you if you have any technical difficulties or are stuck.

You won’t be able to make x**(1.0_sp/2) fast without making ** slower, making sqrt less accurate or making ** non-deterministic.

1 Like

We can always follow the same design as for sin(x), which is: in Debug mode call the libm version via C, which is accurate but not as fast as possible; and in Release mode we can use our own very fast pure Fortran implementation which is very fast, but only 1e-15 accurate or so.

But let me first understand the issue: what is stopping us to notice x**(1.0/2) and substitute a fast implementation? It would not make ** slower.

the problem is that you don’t want x**y to give different values depending on whether the compiler is able to figure out that y==1.0/2. As such if you want ** to be deterministic, you need it to insert a runtime check for the value of y.

We don’t want runtime checks, that would indeed slow down **. We want to do this at compile time. Indeed, at compile time x**(1.0/2) would (if optimizations are enabled) use a faster algorithm than the general power algorithm for x**y. The faster algorithm should ideally give the same values as the general power algorithm, but as with the sin(x) case, it could be that it is not the same to the last bit, a famous example is 1/sqrt(x), where one can even use a dedicated single precision instruction and one or two newton iterations, depending on how much accuracy you want.

In my mind sqrt(x) is exactly equivalent to x**(1.0/2), but if I understand you correctly, you are saying that sqrt(x) should possibly return a different value than x**(1.0/2), because the former would use a faster (less accurate?) algorithm, while the latter would use the slower (but more accurate?) general power algorithm, to be consistent with powers? Is the choice in being consistent between x**y and x**(1.0/2) versus being consistent between x*(1.0/2) and sqrt(x)?

I don’t have an opinion on this yet, until I understand the issues better.

1 Like

No. sqrt (on x86) is just a call to vsqrtsd which is 0.5 ULP accurate, so unless your floating point power is also accurate to 0.5 ULP (which most aren’t), x**y will be less accurate than sqrt(x)

On the technical level, the backends (such as LLVM) still see the sqrt function and know it is intrinsic, so they can generate the vsqrtsd instruction directly and ignore the pure Fortran implementation. So we can still do it this way.

We can also provide a pure Fortran implementation along the lines of: openlibm/e_sqrt.c at 1d2c5e3bf52e050c994510ede91e0d2f0717173a · JuliaMath/openlibm · GitHub or even just call the libm (or openlibm) function directly.

On the design level: indeed if the vsqrtsd is 0.5 ULP accurate and fastest, then that seems to be the best thing to call for sqrt(x). Indeed, the general algorithm for x**y will most likely be slower and less accurate. So the question remains, if x**(1.0/2) should behave like sqrt(x) or like x**y. I assumed it would behave like x**(1.0/2), to get a faster and more accurate result, but you assume it should behave like x**y in order to be consistent with it. We can always provide a compiler option if we can’t agree on a good path forward.

Isn’t this similar to the various -ffast-math style optimizations that are faster but change the result, and so it can break some codes. So one simply leaves this to the user to choose.

1 Like

I think it’s reasonable to make it behave like sqrt, but then unless you also make x**y behave like sqrt you can end up with very weird errors. It would be reasonable to enable with -ffast-math

Can you give an example of such a weird error?

by weird error, I just mean that (for example) -O1 and -O3 could give different results because at -O3 the compiler was able to figure out y was always .5 while with -O1 it didn’t figure that out.

1 Like

Yes, indeed. We probably just need to make these things configurable.

Btw, currently we simply generate llvm.pow.f32/64 for both compile time and runtime exponent:

lfortran/asr_to_llvm.cpp at 1356463ea372f21ac3be3e3feb8cfae03ade45f9 · lfortran/lfortran · GitHub

I assume LLVM doesn’t optimize it out based on @msz59 result above, but I would need to see the details. @msz59 make sure you use --fast when you invoke lfortran. If you post your code, I can analyze it and figure out exactly what is happening.

2 Likes

If your cpu (and programming language) conforms to IEEE 754, sqrt is guaranteed to be correctly rounded in all round modes. Your fast implementation will be incompatible with the IEEE modules (if lfortran implement those). The error is guaranteed to no greater than 0.5 ULP. Oh, IEEE also requires sqrt(-0.) = -0.

Edited:
I just looked at the openlibm source code you pointed to. At the end of the file is a large comment on how to implement sqrt correctly. I used that algorithm to implement sqrtl for FreeBSD libm. It leads to a slowwwwwwww software implementation. If you have FPU support, then you want to use it.

4 Likes

This isn’t the only place these kinds of things happen. For example, I’ve seen things like

real, parameter :: x = sin(1.0)
real :: y, z, one

one = 1.0
y = sin(1.0)
z = sin(one)
print *, x, y, z

print 3 different values before. This is because x is calculated by the compiler’s linked math library, y is calculated by an special case version of the run-time library function as it notices it is a special case, and z is calculated by the regular run-time library function, and each of these can be different. If you want to do optimizations, sometimes you have to be willing to give up some “determinism”.

2 Likes

no, you just have to have a competent compiler.

I tried the following and found the same results (with gfortran and ifort)

implicit none
real, parameter :: x = sin(1.0)
real :: y, z, one

one = 1.0
y = sin(1.0)
z = sin(one)
print *, x, y, z
end

With ifort (ifort test.f90 && ./a.out):

0.8414710  0.8414710 0.8414710

With gfortran (gfortran test.f90 && ./a.out):

0.841470977 0.841470977 0.841470977

So, i think compilers are very good at these calculations.

1 Like