IEEE_FMA: use or avoid?

MarDie · September 23, 2023, 10:50am

IEEE_FMA (https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-1/ieee-fma.html) is now available in gfortran. The fused multiply-add can perform operations of the type a = x + y \cdot z in a single instruction, i.e. faster and more accurate than a naive combination of add and multiply (Multiply–accumulate operation - Wikipedia, FMA instruction set - Wikipedia)

Is it good practice to use it, or should one rather rely on the compiler figuring out where to use FMA?

FedericoPerini · September 23, 2023, 1:06pm

I always see huge performance gains on gfortran when using -march flags that turn on avx, avx2 instructions. This to me means fma is routinely used wherever possible.

Forcing its usage may be beneficial in edge cases where the compiler is not capable of figuring it out by itself, but of course it’s a less “fortrannic” approach, assuming it means “you do the science; the compiler will do the rest”.

See this example for axpy = a*x+y:

with -O3 -march=core-avx2 we get

vfmadd132sd xmm0,xmm1,QWORD PTR [rsi]

with -O3 -mtune=generic we get:

 movsd  xmm0,QWORD PTR [rdi]
 mulsd  xmm0,QWORD PTR [rsi]
 addsd  xmm0,QWORD PTR [rdx]

oscardssmith · September 23, 2023, 4:23pm

Note that while automatic FMA generation usually is beneficial for accuracy, it can also introduce bugs. The classic example of this is that it can make a*a-b*b not equal 0 when a==b.

RonShepard · September 23, 2023, 5:10pm

I agree with the faster part of that statement, but I think historically the accuracy part of that statement is incorrect. The FMA multiply might have different rounding conventions than the normal multiply, or the two instructions might treat denormals differently. This results in expressions like a*a-b*b to evaluate to nonzero even when a==b. The compiler uses the normal multiply operation (with whatever is its rounding conventions, which might be set by the programmer at run time) for one of the multiplies, and then uses the FMA instruction for the second multiply and the addition. If the rounding conventions, or the treatment of denormal numbers, are different for those two multiplications, it can have that surprising, but understandable, result. Another historical oddity is when the intermediates are evaluated in extended (e.g. 80-bit) precision with the normal multiply, but FMA uses a 64-bit multiply convention.

PaulM · March 9, 2024, 1:43pm

The following thread on the Intel Fortran forum provides an example of the potential pitfalls that can arise when evaluating expressions with FMA enabled:

Quote:

an optimization that causes x-FLOOR(x) to be negative seems spectacularly unsafe

Inside this thread, the following post by an Intel employee analyzes the effect of FMA and of various compiler switches on the observed behavior:

Topic		Replies	Views
Compiling only part of code with -ffast-math Help	21	1366	April 21, 2022
Can one design coding rules to follow so that `-ffast-math` is safe?	24	2176	February 24, 2023
Fast_math: A collection of functions for fast number crunching using Fortran Announcements	12	1227	October 3, 2023
Array intrinsics performances/accuracy	20	884	May 20, 2023
Fortran has the fastest matmul Advocacy	2	572	January 12, 2024

IEEE_FMA: use or avoid?

Related topics