Equivalent gfort and ifort compilation

I am trying to compile a code using gfortran instead of ifortran. The gfortran code runs much much slower than the ifortran code. It does not even use all threads from my computer. I have 40 cores and 80 threads. ifort uses all of them.I think it has to do with the ifort flag /Qm64 but can’t be sure (using the /Qm32 flag on ifort is slower and does not use all the threads on openmp - not sure why).

This is the way that visual studio seems to be compiling my code:

Compiling with Intel® Fortran Compiler Classic 2021.4.0 [Intel(R) 64]…
ifort /nologo /O2 /Qopenmp /module:“x64\Release\” /object:“x64\Release\” /Fd"x64\Release\vc160.pdb" /libs:dll /threads /c /Qlocation,link,“C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64” /Qm64 “D:\test\main1.f90”
Linking…
Link /OUT:“x64\Release\DebtDuration.exe” /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:“x64\Release\DebtDuration.exe.intermediate.manifest” /MANIFESTUAC:“level=‘asInvoker’ uiAccess=‘false’” /SUBSYSTEM:CONSOLE /STACK:999999999 /IMPLIB:“D:\test\x64\Release\DebtDuration.lib” -qm64 “x64\Release\splint.obj” “x64\Release\linspace.obj” “x64\Release\random_normal.obj” “x64\Release\spline.obj” “x64\Release\rouwenhorst.obj” “x64\Release\bspline_sub_module.obj” “x64\Release\main1.obj”
Embedding manifest…

Here’s how I was trying to compile in gfortran:

gfortran -c -O2 -m64 bspline_sub_module.f90
gfortran -w -ffree-form -ffree-line-length-0 -m64 -O2 -fopenmp main1.f90 random_normal.f90 linspace.f90 bspline_sub_module.o rouwenhorst.f90 spline.f90 splint.f90
gfortran -fopenmp main1.o random_normal.o linspace.o bspline_sub_module.o rouwenhorst.o spline.o splint.o

What am I doing wrong?

Here’s a video with what I mean in terms of differences in thread/core usage: Dropbox - File Deleted - Simplify your life

The first iteration of the ifort loop runs in 19.9 seconds with 100% CPU usage, the first iteration of the gfort loop runs in 38.8 seconds (almost double) with 50% CPU usage.
If I compile my ifort code with \qm32 flag it gets similar performance to the gfortran.

1 Like

If you want to run the code only on the machine where it is compiled, use -march=native for gfortran. On my system, the performance gain was impressive.

1 Like

Overall, in many cases from my experience, on Linux, ifort and gfortran have similar performance. ifort usually 10-20% faster, but the difference are not very big. However, it seems on windows, with the same flags, gfortran can be 7 times slower than its performance on Linux. Intel’s performance is consistent on windows and Linux.
You may begin with the following flags.
For ifort,

-O3 -xHost

For gfortran

-O3 -march=native

Both can be used with perhaps one single flag,
For ifort,

-fast

For gfortran

-Ofast

See also,

CRquantum, what puzzles me more is the fact that gfort does not make use of all the logical processors available on my computer (see video). It is very puzzling to me.

If I had to take a guess I would say that it’s probably because of different default variables between the two compilers.

There are some very experienced gfortran devs here that might be able to shed some light into this.

As for the difference in per thread performance, I would expect it to be noticeable. Intel’s MKL is an absolutely amazing library that can significantly speed up calculations.

Although I think you can still link to MKL whilst using gfortran

1 Like

I am not gfortran expert. Other people’s opinion may be more useful.
But it looks like you used openMP. The speed difference seems is simply caused by the fact that as you said, gfortran only uses half of the threads as ifort did.
If in openMP you explicitly specify the number of threads you want to use, does that help gfortran to make full use of all the threads? But again, I am not expert in gfortran, other people may give you much better answer and solve the puzzle.

By the way, the -m64 flag in gfortran may be not necessary.
-O3 -march=native may be enough to begin with, in many cases.

Welcome to the Discourse by the way! :slight_smile:

Thank you @gnikit for mentioning intel MKL.

Uhm, may I ask, by performance boost using intel MKL, do you mean using the function’s provided by Intel MKL could give a performance boost or something?
The only reason I did not dig too much into MKL is that, I feel if I use a lot of MKL exclusive function/subroutines, then my code will perhaps be intel Fortran exclusive. So it will not compile on both gfortran and ifort, therefore may not be generic enough.

Or like, do you mean, like, some same lapack subroutines, like you know dgesv, dgeev, or something, by linking MKL there will be performance boost than not linking MKL? Just curious. Again, I guess I am neither expert in ifort or gfortran, just curious. Thanks!

I think MKL does both. They definitely provided optimised versions of libraries like BLAS and LAPACK, etc. but they also seem to have proprietary modules too see the docs.

As for Fortran intrinsics like sum or mult I am not certain whether or not they use MKL in the background.

I have only used MKL passively by providing it at configuration time to external libraries like PETSc and Zoltan

We’d need to learn more about how OpenMP/multi-threading is being used, e.g. is dynamic scheduling being used.

A few other comments:

  • I don’t see much purpose in using the -m64 flag unless you’re cross compiling or for some other odd-reason.
  • As others have said, try using -march=native with gfortran.
  • How are you setting the number of OpenMP threads?
1 Like

I think this is the most important bit. Also, @NC1 are you using dynamic teams? If so try and disable them

call omp_set_dynamic(.false.)

I am just doing this: call omp_set_num_threads(80). But it does not really matter whether I include this line or not. call omp_set_dynamic(.false.) did not make the gfort code any different as well.