Discussion on Hacker News: https://news.ycombinator.com/item?id=23843434
Great idea, made it an issue for LFortran: https://gitlab.com/lfortran/lfortran/-/issues/172.
My motivation to have these backends such as C++ is that people can translate Fortran code right away and don’t have to spend months doing it. And my expectation is that they will find that you actually might not gain much, and only get things slower (which might be the case above, where I think he didn’t use the state of the art Fortran compilers to compare performance). But if you do gain something, that would be the motivation for us to fix up Fortran to catch up. And if people spend months rewriting, maybe they don’t want to throw the work out and use the original Fortran. But if the translation can be done automatically and quickly, they might realize that they might as well stay in Fortran, because it’s better.
Plus, the translated code in target language is likely to be less readable and maintainable because it’s auto-generated.
Regarding readability, I am hoping it could be readable, but it’s a bit too early to know for sure.
There are some interesting points in the ensuing discussion on Hacker News.
Given that the author uses the capitalized FORTRAN, it seems to be an older code F77 style code. I wonder if some restructuring of the original Fortran code, or interfacing into specialized numerical libraries (MKL, FFTW) could have already provided the speed up needed.
Some profiling before the rewrite in Rust would have been helpful to see where is the bottleneck.
Interesting blog post, but I could find neither the original Fortran code nor Fortran compiler and compiler options used in the benchmark. If it’s gfortran and you’re trying to get the fastest execution time (without regards to possible numerical accuracy) then one need to use at a minimum -O3 -funroll-loops -ffast-math -march=native -mtune=native.
Steve, exactly. Many people don’t know that. This is something we would like to fix with fpm to make these options the default in Release mode.
Well, it depends on what you want to accomplish. For gfortran,
-ffast-math will give a performance boost, but it may also break your code. For example,
-ffast-math cannot be used with Kahan’s summation technique as it breaks integrity of parenthesis. The option pair
-march=native -mtune=native will exploit the instruction set for the cpu on which the executable is built. That executable may not run on another cpu. Think about the difference between Intel core2 and AMD zen2. Both are x86_64, but the AMD zen2 has instructions not available to core2 (e.g., AVX).
With regards to the RUST versus Fortran comparison, there are no details about the compilers and options used. The benchmark comparison is useless.
I have read in GCC doc that
-floop-unroll-and-jam : Apply unroll and jam transformations on feasible loops. In a loop nest this unrolls the outer loop by some factor and fuses the resulting multiple inner loops. This flag is enabled by default at -O3. It is also enabled by -fprofile-use and -fauto-profile.
Does it imply
-funroll-loops or is it something different ? And what is a “feasible loop” ?
There is also this option implied by
-fpeel-loops: Peels loops for which there is enough information that they do not roll much (from profile feedback or static analysis). It also turns on complete loop peeling (i.e. complete removal of loops with small constant number of iterations).
@kargl Something like
-ffast-math should never be a default, as it can break things in very subtle ways (example: we used to build our toolchain with it, until someone realized that it broke (sca/)lapack). This also holds for some Intel FP optimization flags which seem to be active by default on certain optimization levels (look at the
We had this whole optimization-by-compiler-flag discussion in Gentoo Linux 10+ years ago and it basically boiled down to
-O2 -march=native -pipe being the only universally save and fast thing to do (for GCC), see also https://wiki.gentoo.org/wiki/Safe_CFLAGS (unless of course a package was explicitly tested by upstream for more aggressive optimization flags).
Unfortunately, gcc, and by extension gfortran, seems to have thousands of options. I’ve never gone down the rabbit hole of finding what
-O3 turn on or off. I tend to use the following for debugging code:
FFLAGS = -g -pipe -O -fmax-errors=1 -Werror -Wall -fcheck=all FFLAGS+= -ffpe-trap=invalid -fno-backtrace
for production code I use
FFLAGS = -O2 -pipe -march=native -mtune=native FFLAGS+= -funroll-loops --param max-unroll-times=4 FFLAGS+= -ftree-vectorize -Wall -fno-backtrace
I also turn off a few warnings that are too noisy with false-positives:
FFLAGS+= -Wno-maybe-uninitialized -Wno-conversion -Wno-integer-division
Finally, I’ve tried the link-time optimization flag -flto, but this had a negative performance impact on my code.
@tiziano.mueller, I agree with you on
-ffast-math. I often refer to it as the
-fbroken-math option. With regards to
-march, I can never remember which one implies the other, so I just use both. These have were set years ago in my global
@tiziano.mueller, What you wrote is true for code that you know nothing else about. But if you control the code you write, such as the authors of an fpm package, you know if your code works with say
-ffast-math or not, and so if it does not, you could disable it in
fpm.toml. The Intel Fortran compiler effectively has
-ffast-math enabled by default also, as you noted.
The issue of good defaults is that most people do not know what options to enable for gfortran to get good performance out of it. I feel if we set the defaults to just
-O2 -march=native -pipe, then that is what people will use and they will be missing on good performance.
A better approach I feel is to enable the defaults as something like
-O3 -march=native -funroll-loops -ffast-math and then if your fpm package requires less aggressive optimizations, then you set it in
fpm.toml. That way you ensure that your package works, and users still get good speed by default for their code.
P.S. I should mention that the
-ffast-math is tricky — if a library uses it, it actually can mess up the main application even if it doesn’t use it, because the
-ffast-math enables some hardware things like no denormal numbers. So it might be that if the dependents have
-ffast-math off, we might need to turn it off for all dependencies. But fpm could do that.
@kargl in principle you can leave
-g on for production as well since it should not have any performance implications. If you care about binary size, you can strip, split and compress the debug symbols afterwards (which is something the build system could do automatically) to have the best of both worlds.
@certik exactly, and the fpm doesn’t know anything about the codes internals by default. There is a good reason why the GCC devs do not enable it by default and encouraging speed over correctness doesn’t make any sense. The problem with
-ffast-math is that it is hard to verify whether your code is working correctly (and with which gcc version and system/arch nonetheless). We had thousands of tests on a vast number of architectures and hosts for years with just a number of spurious errors from users and devs until someone was able to introduce a test to reproduce it properly. So,
-ffast-math does not make a good default.
-O3 as well as
-funroll-loops are a different story. As the GCC manual still says:
may or may not make your code run faster.
@tiziano.mueller what would be a good design for
fpm? One way would be that each
fpm package could indicate if it works with
fpm.toml, otherwise it is assumed that it does not. Then when I am building my application that I know works with
-ffast-math (all my personal codes work with it) I can tell
fpm to use it (as well as for dependencies, assuming all “allow” it). But by default
fpm would not use it.
The issue with this approach is that people then forget to turn it on to speedup their codes (even if they worked with it).
Human intelligence; either someone knows about the up- and downsides of enabling
-ffast-math in numerical code and is capable of verifying that their code is working correctly under
-ffast-math and then they can flag it as such, or they don’t.
Yes, a human must decide whether a package can use
But what I am asking is if you think the design for
fpm that I suggested above would work, given everything we discussed.
oh, I completely misread your previous message, I am sorry.
Yes, I think that this proposal should work. Given the high-level design of the FPM maybe call the option
enable_extra_optimizations if you think it would be overly cautious) to enable the respective optimization for the used compiler. And then document the option with the hint that a developer of the package should test whether their package works with said options.
No problem. Thanks for the reply. Yes, it’s tricky how to call it, or what it even means in general —
-ffast-math is enabling a bunch of unrelated options, for example arithmetic rearrangement, which is unsafe for certain codes such as the Kahan summation algorithm, but it is perfectly safe for other kinds of codes. One way to think about it could be via the Fortran standard — which allows a little more optimizations than C, but not many more actually (for example the Fortran standard does not allow an optimization that would break the Kahan summation), and so by “safe” we could mean those optimizations as permitted by the standard, and “unsafe” all others.
The optimization, enabled by
-ffast-math that breaks the kahan summation technique, is not allowed by the Fortran standard. See Fortran 2018, 7.1.8 Integrity of parentheses.