Fast Math library for modern fortran

The recent discussion on this forum about fortran-not-so-fast-as-they-claim motivated me to find some more time to cleanup and release more of my codes and make them available to the broader Fortran community.

So there we go, I’m releasing a library of fast math functions I’ve been creating over time:

perazz/fastmath: A Modern Fortran library for fast, approximate math functions (github.com)

This library contains approximated versions of exp(x), log(x), 1/sqrt(x) and sincos(x).
I’m pretty happy with both their accuracy and speed, they’re more accurate than needed for most of my cases, but I’m sure their performance will vary wildly with compiler/cpu/architecture.

Here’s some runtimes of the testcase on my EPYC workstation + gcc-12 + -O3:

 *** fast exp(x) test ***
 quintic spline : time =   12.5000 ns/eval, speed-up= 1.79X, relerr=   1.10814E-006
   cubic spline : time =   11.1979 ns/eval, speed-up= 2.00X, relerr=   2.39339E-004
       intrinsic: time =   22.3958 ns/eval, speed-up= 1.00X, relerr=   0.00000E+000
       linear   : time =   13.5417 ns/eval, speed-up= 1.65X, relerr=   1.90068E-002
       degree 2 : time =   14.5833 ns/eval, speed-up= 1.54X, relerr=   1.78246E-003
       degree 3 : time =   14.0625 ns/eval, speed-up= 1.59X, relerr=   1.23874E-004
       degree 4 : time =   14.8438 ns/eval, speed-up= 1.51X, relerr=   7.19199E-006
       degree 5 : time =   15.8854 ns/eval, speed-up= 1.41X, relerr=   3.56917E-007
 *** fast log(x) test ***
       spline(5): time =    3.1250 ns/eval, speed-up=10.00X, relerr=   2.72011E-006
       spline(3): time =    3.1250 ns/eval, speed-up=10.00X, relerr=   3.93305E-005
       intrinsic: time =   32.0312 ns/eval, speed-up= 0.98X, relerr=   0.00000E+000
 *** fast 1/sqrt(x) test ***
       intrinsic: time =    2.0312 ns/eval, speed-up= 1.05X, relerr=   0.00000E+000
          quake3: time =    1.5625 ns/eval, speed-up= 1.36X, relerr=   9.36080E-004
22 Likes

Nice! Is the relative accuracy of sin/cos roughly 1e-5? So it’s good for single precision?

I don’t have time right now, but we should benchmark against:

You should be able to significantly beat your quake3 invsqrt time in speed and accuracy. Modern CPUs have inverse sqrt instructions that are really fast and give 10 to 15 bits.

1 Like

I think you’re right: the test I pasted here was run on a v1 EPYC processor that maybe doesnt have specific instruction; on a Mac M1, the intrinsic is about as fast as the approximate version.

Yeah some 10^-5 accuracy was acceptable for my use cases (3d geometry). I see the lFortran implementation use the bit shifts to move to the [0,2pi] quadrants which my implementation does not have (I use floor which should however be based on bit manipulations I guess?); maybe merging the two approaches would tell us if we can speed that up even further!