I am trying to compile a code using gfortran instead of ifortran. The gfortran code runs much much slower than the ifortran code. It does not even use all threads from my computer. I have 40 cores and 80 threads. ifort uses all of them.I think it has to do with the ifort flag /Qm64 but can’t be sure (using the /Qm32 flag on ifort is slower and does not use all the threads on openmp - not sure why).
This is the way that visual studio seems to be compiling my code:
The first iteration of the ifort loop runs in 19.9 seconds with 100% CPU usage, the first iteration of the gfort loop runs in 38.8 seconds (almost double) with 50% CPU usage.
If I compile my ifort code with \qm32 flag it gets similar performance to the gfortran.
If you want to run the code only on the machine where it is compiled, use -march=native for gfortran. On my system, the performance gain was impressive.
Overall, in many cases from my experience, on Linux, ifort and gfortran have similar performance. ifort usually 10-20% faster, but the difference are not very big. However, it seems on windows, with the same flags, gfortran can be 7 times slower than its performance on Linux. Intel’s performance is consistent on windows and Linux.
You may begin with the following flags.
For ifort,
-O3 -xHost
For gfortran
-O3 -march=native
Both can be used with perhaps one single flag,
For ifort,
CRquantum, what puzzles me more is the fact that gfort does not make use of all the logical processors available on my computer (see video). It is very puzzling to me.
If I had to take a guess I would say that it’s probably because of different default variables between the two compilers.
There are some very experienced gfortran devs here that might be able to shed some light into this.
As for the difference in per thread performance, I would expect it to be noticeable. Intel’s MKL is an absolutely amazing library that can significantly speed up calculations.
Although I think you can still link to MKL whilst using gfortran
I am not gfortran expert. Other people’s opinion may be more useful.
But it looks like you used openMP. The speed difference seems is simply caused by the fact that as you said, gfortran only uses half of the threads as ifort did.
If in openMP you explicitly specify the number of threads you want to use, does that help gfortran to make full use of all the threads? But again, I am not expert in gfortran, other people may give you much better answer and solve the puzzle.
By the way, the -m64 flag in gfortran may be not necessary. -O3 -march=native may be enough to begin with, in many cases.
Uhm, may I ask, by performance boost using intel MKL, do you mean using the function’s provided by Intel MKL could give a performance boost or something?
The only reason I did not dig too much into MKL is that, I feel if I use a lot of MKL exclusive function/subroutines, then my code will perhaps be intel Fortran exclusive. So it will not compile on both gfortran and ifort, therefore may not be generic enough.
Or like, do you mean, like, some same lapack subroutines, like you know dgesv, dgeev, or something, by linking MKL there will be performance boost than not linking MKL? Just curious. Again, I guess I am neither expert in ifort or gfortran, just curious. Thanks!
I think MKL does both. They definitely provided optimised versions of libraries like BLAS and LAPACK, etc. but they also seem to have proprietary modules too see the docs.
As for Fortran intrinsics like sum or mult I am not certain whether or not they use MKL in the background.
I have only used MKL passively by providing it at configuration time to external libraries like PETSc and Zoltan
I am just doing this: call omp_set_num_threads(80). But it does not really matter whether I include this line or not. call omp_set_dynamic(.false.) did not make the gfort code any different as well.