Dear all,
I just wonder, has anyone use -Ofast in gfortran or -fast flag in Intel Fortran?
I sometimes find that some results with -fast are a little different from -O3.
Is -Ofast or -fast safe to use?
Thank you very much in advance!
Dear all,
I just wonder, has anyone use -Ofast in gfortran or -fast flag in Intel Fortran?
I sometimes find that some results with -fast are a little different from -O3.
Is -Ofast or -fast safe to use?
Thank you very much in advance!
If you can’t find the information in the documentation of the compilers (or you can but can’t understand it), would you consider filing a bug report? That way, the next person with the same issue has a hope of finding good information instead of googling forums for the answer.
Good luck trusting the answer you will get here.
Yes and no !
You will have to compare the results of your program with and without…
With -Ofast
, you will probably lose a few digits on the right. Most of the time, it runs fine. But it is less safe than -O3
or -O2
. It must be a conscious choice.
See:
-Ofast
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition.
-ffast-math
Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.
This option causes the preprocessor macro__FAST_MATH__
to be defined.
This option is not turned on by any -O option besides -Ofast since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
For ifort, note that -fast
is shorthand for a bunch of other options that have the chance of different low-order bits in FP calculations. Another option it includes is -xHost, which selects the instruction set for the processor you are compiling on.
Ifort’s -fast currently is:
On macOS* systems: -ipo, -mdynamic-no-pic, -O3 , -no-prec-div , -fp-model fast=2 , and -xHost
On Windows* systems: /O3 , /Qipo, /Qprec-div-, /fp:fast=2 and /QxHost
On Linux* systems: -ipo, -O3, -no-prec-div,-static, -fp-model fast=2, and -xHost
and this options varies over time: in short, it’s whatever makes the current top of the line processor run benchmarks really fast. Note the very aggressive fp-model fast=2, not to mention non-precise divides.
I don’t like -fast. I liken it to a sausage: It make look pretty and smell pretty but you do not know what’s in it and some of the things in it can be really bad for you. If you care about numerics do not use -fast.
I think it would be a good idea to elaborate a bit about what the phrase “exact implementation of IEEE or ISO rules” means in practice and the implications of using -ffast-math
can have on one’s code.
For examples functions from ieee_arithmetic
like ieee_is_finite
and ieee_is_nan
will simply evaluate to False (see this SO post)
Thanks @greenrongreen and @gnikit for your answers and welcome to the Forum!
You can see this post: https://fortran-lang.discourse.group/t/can-one-design-coding-rules-to-follow-so-that-ffast-math-is-safe/ where I posted a link to a document which has examples of codes that break when -ffast-math
is used, and also how to fix them.
I found that, at least for my program which uses F77 and f90 ODE solvers,
-O3 -march=native
, the result is almost the same as intel’s -O3 -xHost
.-Ofast -march=native
, the result is a little bit noticeably different from its own -O3 -march=native
.-O3 -xHost
and -Ofast -xHost
gives identical results.In terms of speed, on my laptop,
at least on Ubuntu, gfortran and Intel Fortran are about the same, it seem not too much noticeable difference.
On Windows, not very sure why, but gfortran is about 3 times slower than intel’s.
Intel’s Fortran’s speed and behavior (such as call execute_command_line
) are consistent in both Ubuntu and Windows.
On Mac, I do not know, but I guess the gfortran and intel Fortran should perform like they perform on Ubuntu. But it seems intel OneAPI on Mac OS does not have MPI.
A few digits is not the issue with using -Ofast or -O3.
The problem with these options is more a catastrophic error where the optimisation changes the logic of the code, sometimes only for obscure run cases.
It is best to reserve high optimisation for the “straightforward” 5% of code that produces 95% of the run time, but who’s to know what this is ! I am probably guilty of not practicing this.
Optimisation with OpenMP is also more of an issue, depending on if the optimiser can recognising the additional !$… syntax.