We are having a discussion in llvm discourse (RFC: The meaning of -Ofast - Flang - LLVM Discussion Forums) about what
Ofast means for
llvm/flang. One way to model this would be to do the same as what Clang (the LLVM C/C++ compiler) does, i.e enable all optimisations at
-ffast-math where the latter allows transformations that might break strict IEEE conformance for floats and other transformations.
There are also some suggestions to model it as what Fortran developers would naturally expect from using this flag.
It will be great if we can get some input on this topic in the llvm discourse thread or if that is not possible then in this thread. Thanks in advance.
Just a thought: you could try a small program like:
end program ofast
and run that through various compilers with various options. The intrinsic function prints the actual compiler options that were used to build the program.
-Ofast should not be used because it sets flags to truncate subnormal numbers to 0 which can break linked programs that were relying on IEEE complaint behavior.
The use of SSE/AVX/GPU hardware also breaks IEEE compliant behavior, so if a programmer is aware of this and still wants to take advantage of the increased performance, there should be some way to tell the compiler to do so.
So some extent, options like -Ofast depend on the underlying hardware. Yes, there may be some high-level transformations that are common to all hardware, but the real impact is more likely using some low-level hardware feature, such as a GPU or some vector hardware. Another possibility is the option to link different math libraries, one for speed where some documented compromises have been made regarding accuracy, and another for accuracy where documented compromises have been made regarding performance. A single option like -Ofast would be expected to make some overall selection of these various high- and low-level choices.
I think there is no general agreement in the Fortran community on these optimization flags.
I personally recommend writing numerical codes in such a way so that you can use
-ffast-math, but some others recommend not to use this flag.
The problem with “-Ofast” is that it is suggesting there is no trade-off. There is always a trade-off.
The difference between
-Ofast and instructions like AVX is that using an AVX instruction won’t break other programs that aren’t using them. If you ever link any program using
-Ofast to anything else, it will disable IEEE semantics for all of the programs even if they don’t use
-ffast-math is problematic for a library. I had to disable it by default, because it messes up end applications. But for an end application that you control and want the best performance, I recommend using the flag.
In general, I expect
-Ofast to enable at least
-ffast-math, but depending on the compiler it might enable more than that. As far I know, there is no standard for what
-Ofast does, so I guess the best way to implement
-Ofast is the same as in Clang. At least in this case we will avoid the mess (same flag does one thing in Clang but another thing in Flang or LFortran.)
I have seen codes that do compile and run with the more usual
-O2 while I get segmentation faults with
-Ofast - or vice-versa (usually this may happen when the code uses C interoperability.) I also know developers of quality software that won’t go with anything above
-O1 and would rather parallelize the code manually instead.
personally, I prefer to avoid the generic
-Ofast and give the compiler specific flags instead. As @certik already pointed out,
-ffast-math is the most common one, but I also use
-ftree-vectorize etc. It all depends on the code. Whenever I use libraries, especially the ones I didn’t wrote and can’t control myself, I would avoid those too.