Why is the Intel compiled executable that much faster than gnu?

I am not sure of the build history for your libraries, but for my multi-threaded solvers using Gfortran, I include “-O3 -march=native -fopenmp” which can reliably improve performance for my computation.
I also use “-ffast-math -fstack-arrays” although where multi-threading is not effective, these may have marginal effect.

I have also removed “hyper-threading” for cases of poor OMP efficiency. You could also experiment with fewer threads as it appears that your computation is limited by memory bandwidth or cache size, both of which are dependent on hardware rather than compiler options.

Did you do the different tests on the same hardware or is it compiler and hardware linked tests ? Memory bandwidth and cache size are significant where poor threading efficiency occurs.

Multi-threading does have significant startup overheads (~10,000 processor cycles) so lots of small !$OMP regions can be ineffective. Intel was better than Gfortran for this when I tested years ago, but tuning these types of problems can always change. (Possibly exclude small computation load loops from !$OMP?)

Using multiple hardware options is also challenging where mutti-threading is as ineffective as you are reporting.

1 Like