Fast Math library for modern fortran

FedericoPerini · October 25, 2022, 9:27am

The recent discussion on this forum about fortran-not-so-fast-as-they-claim motivated me to find some more time to cleanup and release more of my codes and make them available to the broader Fortran community.

So there we go, I’m releasing a library of fast math functions I’ve been creating over time:

perazz/fastmath: A Modern Fortran library for fast, approximate math functions (github.com)

This library contains approximated versions of exp(x), log(x), 1/sqrt(x) and sincos(x).
I’m pretty happy with both their accuracy and speed, they’re more accurate than needed for most of my cases, but I’m sure their performance will vary wildly with compiler/cpu/architecture.

Here’s some runtimes of the testcase on my EPYC workstation + gcc-12 + -O3:

 *** fast exp(x) test ***
 quintic spline : time =   12.5000 ns/eval, speed-up= 1.79X, relerr=   1.10814E-006
   cubic spline : time =   11.1979 ns/eval, speed-up= 2.00X, relerr=   2.39339E-004
       intrinsic: time =   22.3958 ns/eval, speed-up= 1.00X, relerr=   0.00000E+000
       linear   : time =   13.5417 ns/eval, speed-up= 1.65X, relerr=   1.90068E-002
       degree 2 : time =   14.5833 ns/eval, speed-up= 1.54X, relerr=   1.78246E-003
       degree 3 : time =   14.0625 ns/eval, speed-up= 1.59X, relerr=   1.23874E-004
       degree 4 : time =   14.8438 ns/eval, speed-up= 1.51X, relerr=   7.19199E-006
       degree 5 : time =   15.8854 ns/eval, speed-up= 1.41X, relerr=   3.56917E-007
 *** fast log(x) test ***
       spline(5): time =    3.1250 ns/eval, speed-up=10.00X, relerr=   2.72011E-006
       spline(3): time =    3.1250 ns/eval, speed-up=10.00X, relerr=   3.93305E-005
       intrinsic: time =   32.0312 ns/eval, speed-up= 0.98X, relerr=   0.00000E+000
 *** fast 1/sqrt(x) test ***
       intrinsic: time =    2.0312 ns/eval, speed-up= 1.05X, relerr=   0.00000E+000
          quake3: time =    1.5625 ns/eval, speed-up= 1.36X, relerr=   9.36080E-004

certik · November 11, 2022, 4:40am

Nice! Is the relative accuracy of sin/cos roughly 1e-5? So it’s good for single precision?

I don’t have time right now, but we should benchmark against:

oscardssmith · November 11, 2022, 4:49am

You should be able to significantly beat your quake3 invsqrt time in speed and accuracy. Modern CPUs have inverse sqrt instructions that are really fast and give 10 to 15 bits.

FedericoPerini · November 11, 2022, 6:13am

I think you’re right: the test I pasted here was run on a v1 EPYC processor that maybe doesnt have specific instruction; on a Mac M1, the intrinsic is about as fast as the approximate version.

FedericoPerini · November 11, 2022, 6:17am

Yeah some 10^-5 accuracy was acceptable for my use cases (3d geometry). I see the lFortran implementation use the bit shifts to move to the [0,2pi] quadrants which my implementation does not have (I use floor which should however be based on bit manipulations I guess?); maybe merging the two approaches would tell us if we can speed that up even further!

Topic		Replies	Views
Fast_math: A collection of functions for fast number crunching using Fortran Announcements	12	1226	October 3, 2023
How to contribute to LFortran's runtime library	28	1085	November 25, 2022
Fastmath library	8	438	September 10, 2024
A (mis)adventure in optimization	2	1380	January 6, 2021
Fortran runtime math library	15	2890	March 10, 2021

Fast Math library for modern fortran

Related topics