Suppose one would like to contribute. Where to start? I cannot see any “ToDo” document. I looked at impure/lfortran_intrinsic_math.f90
 there are a few dotproductXXXX
functions with no body  is this to be done? Also, many math functions there are just wrappers to C functions  where is that C code placed?
Thanks for your interest @msz59 !
The C file is here:
It is being called from the Fortran files. We implement things in C that require calling into system functions and we also currently use libm
for math functions implementations. We have a prototype pure Fortran implementation of sin(x)
and we should do the other functions too and add a compiler option to choose between libm
and pure Fortran.
There is a wiki page with the current status of intrinsic function implementation: Implementation Status of Intrinsic Functions · Wiki · lfortran / lfortran · GitLab
Beyond that, I am happy to answer any questions.
Thanks for making this a separate thread. And for explanations.
Still, I am confused. The lfortran_intrinsic_sin_c.c file that you mention includes only a few functions computing twooverpi. The impure/xxx_math.f90 that I asked about contains a lot of wrappers to C functions using bind(c, name="_lfortran_XXX")
. I could not find any of those lfortran_XXX
functions
My bad, I pointed you to a wrong file. This is the correct C file: lfortran/lfortran_intrinsics.c at 6cb62b1e0e96c6e4c0ba6ddb201bfc4e5c1ab670 · lfortran/lfortran · GitHub, it’s in the src/libasr
directory because the file is shared with LPython.
Sorry for naive questions but I know nothing about writing compilers. Maybe it should move to private communication but I thought maybe somebody else would like to try and encounters similar doubts/questions.
 What is the general idea  to write the runtime preferably in pure Fortran or to write fast code, possibly using C? From what I see on first glance, there is a mix of these. EXP(), LOG() and many others are implemented as wrappers to C library but, to my surprise, SQRT() (except complex) is written as
x**(1.0_sp/2)
which takes 815 times longer thansqrt
implemented in gfortran12 on an Intel i53330S machine depending on options used.  What is/are the planned target architectures for LFortran? Can one assume e.g. IEEE FP number model/arithmetic?
 In the Implementation Status table, what is the difference between green tick and red cross? The explanation of having/not having
llvm
inintegration_tests/CMakeLists.txt
is a black magic to me. Is there anything to do for a runtime contributor for the latter group (red crosses)? If I notice properly, the Basic Impl(ementation) column has a green tick if at least some (but not necesserily all) tests have green ticks.  If the column Tested in, llvm status is empty, does it mean that one has to write test suite for that intrinsic? What is the notion of comprehensive tests?
Thanks for the questions @msz59. Here are my answers.

The idea is to write the fastest code possible. Since LFortran can generate such a code (as fast or faster than other C compilers), we can do this in Fortran and gain those speed advantages. And using Fortran has the advantages that other Fortran programmers can help with it. If
x**(1.0_sp/2)
currently takes longer thansqrt
in libc, then we have to optimize it (we have to do this optimization anyway). Go ahead and report an issue with your benchmark codes and we can work on that. If you think it is not possible to do such an optimization then we should discuss and implement it differently. 
We will have two modes, a Debug and Release mode. In Debug mode you can assume IEEE FP and we have to be careful to do things properly so that programs do not break. The Release mode options will allow to relax some of the requirements to gain speed, and users can choose to use those if it works for them, and will not use those if it doesn’t work for them.

You would have to read the compiler sources and see the actual latest status. If you want to help, you can simply find a function that doesn’t work yet and work on the implementation. I am happy to meet over video to get you started. You can also help us maintain the wiki page to be current, but it’s quite a lot of work and it seems the time is better spent actually implementing functions (definitely for me).

If LLVM is empty, then we do not have a test in integration_tests for this function, thus not being tested with LLVM.
I recommend you go here: lfortran/integration_tests at main · lfortran/lfortran · GitHub and see if you see a test for a given function. The test must be registered here: lfortran/CMakeLists.txt at main · lfortran/lfortran · GitHub and you must see the “llvm” flag. If you do, then this test is compiled to machine code (via the LLVM backend) and run and it passes (on our CI). Those are the ultimate tests. So if you don’t see it there, you can go ahead and implement those functions. I can help you if you have any technical difficulties or are stuck.
You won’t be able to make x**(1.0_sp/2)
fast without making **
slower, making sqrt
less accurate or making **
nondeterministic.
We can always follow the same design as for sin(x)
, which is: in Debug mode call the libm
version via C, which is accurate but not as fast as possible; and in Release mode we can use our own very fast pure Fortran implementation which is very fast, but only 1e15
accurate or so.
But let me first understand the issue: what is stopping us to notice x**(1.0/2)
and substitute a fast implementation? It would not make **
slower.
the problem is that you don’t want x**y
to give different values depending on whether the compiler is able to figure out that y==1.0/2
. As such if you want **
to be deterministic, you need it to insert a runtime check for the value of y
.
We don’t want runtime checks, that would indeed slow down **
. We want to do this at compile time. Indeed, at compile time x**(1.0/2)
would (if optimizations are enabled) use a faster algorithm than the general power algorithm for x**y
. The faster algorithm should ideally give the same values as the general power algorithm, but as with the sin(x)
case, it could be that it is not the same to the last bit, a famous example is 1/sqrt(x)
, where one can even use a dedicated single precision instruction and one or two newton iterations, depending on how much accuracy you want.
In my mind sqrt(x)
is exactly equivalent to x**(1.0/2)
, but if I understand you correctly, you are saying that sqrt(x)
should possibly return a different value than x**(1.0/2)
, because the former would use a faster (less accurate?) algorithm, while the latter would use the slower (but more accurate?) general power algorithm, to be consistent with powers? Is the choice in being consistent between x**y
and x**(1.0/2)
versus being consistent between x*(1.0/2)
and sqrt(x)
?
I don’t have an opinion on this yet, until I understand the issues better.
No. sqrt
(on x86) is just a call to vsqrtsd
which is 0.5 ULP accurate, so unless your floating point power is also accurate to 0.5 ULP (which most aren’t), x**y
will be less accurate than sqrt(x)
On the technical level, the backends (such as LLVM) still see the sqrt
function and know it is intrinsic, so they can generate the vsqrtsd
instruction directly and ignore the pure Fortran implementation. So we can still do it this way.
We can also provide a pure Fortran implementation along the lines of: openlibm/e_sqrt.c at 1d2c5e3bf52e050c994510ede91e0d2f0717173a · JuliaMath/openlibm · GitHub or even just call the libm
(or openlibm
) function directly.
On the design level: indeed if the vsqrtsd
is 0.5 ULP accurate and fastest, then that seems to be the best thing to call for sqrt(x)
. Indeed, the general algorithm for x**y
will most likely be slower and less accurate. So the question remains, if x**(1.0/2)
should behave like sqrt(x)
or like x**y
. I assumed it would behave like x**(1.0/2)
, to get a faster and more accurate result, but you assume it should behave like x**y
in order to be consistent with it. We can always provide a compiler option if we can’t agree on a good path forward.
Isn’t this similar to the various ffastmath
style optimizations that are faster but change the result, and so it can break some codes. So one simply leaves this to the user to choose.
I think it’s reasonable to make it behave like sqrt
, but then unless you also make x**y
behave like sqrt
you can end up with very weird errors. It would be reasonable to enable with ffastmath
Can you give an example of such a weird error?
by weird error, I just mean that (for example) O1 and O3 could give different results because at O3
the compiler was able to figure out y
was always .5
while with O1
it didn’t figure that out.
Yes, indeed. We probably just need to make these things configurable.
Btw, currently we simply generate llvm.pow.f32/64
for both compile time and runtime exponent:
lfortran/asr_to_llvm.cpp at 1356463ea372f21ac3be3e3feb8cfae03ade45f9 · lfortran/lfortran · GitHub
I assume LLVM doesn’t optimize it out based on @msz59 result above, but I would need to see the details. @msz59 make sure you use fast
when you invoke lfortran
. If you post your code, I can analyze it and figure out exactly what is happening.
If your cpu (and programming language) conforms to IEEE 754, sqrt
is guaranteed to be correctly rounded in all round modes. Your fast implementation will be incompatible with the IEEE modules (if lfortran implement those). The error is guaranteed to no greater than 0.5 ULP. Oh, IEEE also requires sqrt(0.) = 0.
Edited:
I just looked at the openlibm source code you pointed to. At the end of the file is a large comment on how to implement sqrt
correctly. I used that algorithm to implement sqrtl
for FreeBSD libm. It leads to a slowwwwwwww software implementation. If you have FPU support, then you want to use it.
This isn’t the only place these kinds of things happen. For example, I’ve seen things like
real, parameter :: x = sin(1.0)
real :: y, z, one
one = 1.0
y = sin(1.0)
z = sin(one)
print *, x, y, z
print 3 different values before. This is because x
is calculated by the compiler’s linked math library, y
is calculated by an special case version of the runtime library function as it notices it is a special case, and z
is calculated by the regular runtime library function, and each of these can be different. If you want to do optimizations, sometimes you have to be willing to give up some “determinism”.
no, you just have to have a competent compiler.
I tried the following and found the same results (with gfortran and ifort)
implicit none
real, parameter :: x = sin(1.0)
real :: y, z, one
one = 1.0
y = sin(1.0)
z = sin(one)
print *, x, y, z
end
With ifort (ifort test.f90 && ./a.out):
0.8414710 0.8414710 0.8414710
With gfortran (gfortran test.f90 && ./a.out):
0.841470977 0.841470977 0.841470977
So, i think compilers are very good at these calculations.