@RonShepard that’s right, summation is a great example, I also discuss it in my document above. If you require some strict order, then it’s hard (impossible) for the compiler to vectorize differently without changing the answer. On the other hand, if you relax your requirements (as in my “rules”), then the compiler is free to rearrange, and you just have to ensure your code is robust against those numerical changes of the answer.
@septc , If you can reduce your calculation time from 3 days to 2 days, getting different results that are not significantly different, then the -ffast-math
option is the option to use.
There are many here who claim that -ffast-math produces the “wrong” result and should not be used, but for my calculations for structural finite element analysis, I can not show the results are wrong, just different.
What is a significant error !
My FE analysis basically solves for x in the linear equation f = [K] . x, where K is a large matrix, typically well behaved. This is repeated for many “f” values over many steps.
Once I have each solution for x, I can easily calculate e = f - [K].x to test for the maximum error identified in e. With or without -ffast-math, I don’t get a significant difference in max ( e ), so I continue to use -ffast-math.
It would be good to identify what characteristic may increase the likliehood of fast error. ( My guess it is related to round-off and underflow, which does not grow in a well conditioned set of equations)
Basically the equations I have are well behaved and the error due to -ffast-math are orders of magnitude less than those of other system assumptions.
It is interesting to look at a referenced example “a = 1e9+1; b = -1e9; c = 0.1; e = ((a+b)+c) - (a+(b+c))”
Although -ffast-math may produce an error that appears significantly different, you have to ask what is the accuracy of the values “a” and “b”. In comparison to the accuracy of a or b, the different e estimates are not that significantly different after all. This example reduces to what is an acceptable error, given the accuracy of the inputs and choose an appropriate floating point precision.
Also for the reference to Kahan sum examples, a simpler alternative solution can be to use a higher precision accumulator. The use of the 8087 80-bit accumulator register did hide a lot of numerical precision problems, that became more evident when AVX registers were used.
I continue to use -ffast-math
@ JohnCampbell Exactly, for most of my applications it is the same as you describe, which mostly involves measured data, in one way or another.
When the error of the input data is significantly larger than any re-ordering error, the speed-up could in many cases be well worth it.
@JohnCampbell great points. One idea I have is that there might be a way for the compiler to help pin point a place in the program where the “fast-math” vs strict standards conforming results start to diverge, to help with debugging.
I don’t think “starting to diverge” is especially meaningful. If you’re using fast-math, you almost certainly expect some level of divergence, and the point where your program goes from a few ULPs of difference to O(1) difference may be in a place where nothing fancy is happening.