Intel vs gfortran performance

First is the output from gfortran

gfortran_output

Intel output

Both were run on the same system.

Processor | 11th Gen Intel(R) Core™ i5-11500 @ 2.70GHz 2.71 GHz
Installed RAM | 8.00 GB (7.83 GB usable)
System type | 64-bit operating system, x64-based processor

How come the compute time difference is so large?

You should set the same optimize options. Generally
gfortran foo.f90 -O3 -march=native
and
ifort foo.f90 -O3 -xHost

1 Like

I got this warning

ifort: command line warning #10006: ignoring unknown option ‘/xHost’

but the code works.

intel is still (4 times) faster.

In my x86-64 Ubuntu system ifort -O3 -xHost was about 25% faster than gfortran -O3 -march=native with one program that spent most of its time calculating a double precision intrinsic but about 25% slower with another.
Both programs started with x=1; one then went 10**9 times round a loop containing
x = atan(x) , the other went 10**9 times around x = log(1d0+x). Of course we were not told what Shahid’s program was doing. Integer or single or quad precision arithmetic, or non-numerical work, may give quite different run-time differences.

Sorry , On windows, It is /QxHost

Since we don’t have access to the source code we can’t make any specific comments as to the performance difference.

As a general comment, the two compilers are developed by two very different teams with vastly different funding sources; ifort/ifx by Intel’s engineers and gfortran mostly by open-source volunteer contributors. The really impressive part is how gfortran, a completely free and open-source compiler, is able to complete so well with a compiler that up until a couple years ago was a paid, high-end product (parallel studio XE).

3 Likes

The iteration has some form like that:

...
 do istep = 1, nsteps   ! 1000
     do i = 1, Nx           ! 512
        do j = 1, Ny        ! 512

            dfde1(i,j) = -3.5*e1(i,j) + 1.5*e1(i,j)**3 + &
                &2*eta1(i,j)*( e2(i,j)**2 + e3(i,j)**2 + e4(i,j)**2 )
           .
           . ! same type of calculation for 50 times 
           .

           jp = j + 1
           jm = j - 1

           ip = i + 1
           im = i - 1

           if ( im == 0 ) im = Nx
           if ( ip == ( Nx + 1) ) ip = 1
           if ( jm == 0 ) jm = Ny
           if ( jp == ( Ny + 1) ) jp = 1

           lap_e1(i,j) = ( e1(ip,j) + e1(im,j) + e1(i,jm) + e1(i,jp) - &
                4.0*e1(i,j) ) / ( dx*dy )
           .
           . ! for 50 terms
           .

           e1(i,j) = e1(i,j) - 0.5*0.6*( dfde1(i,j) - 0.56*lap_e1(i,j) )
           .
           . ! for 50 terms
           .
        end do
    end do
end do

Note, that in the sample code you are accessing your 2D matrices in a non-contiguous manner, that will affect performance. Fortran is column-major, not row-major, so the loop order should be reversed.

Yeah. But the code is the same for both compilers.

Sorry but, no minimum working example, useless discussion :man_shrugging:

4 Likes

It does not follow that the progress of the calculation will be the same or that the results will be the same or even comparable. Until you establish that the program is running correctly and the results are similar, it is not useful to compare run times.

You should note that the default for Intel is “optimize” whereas for Gfortran it is not. Before you compare run times and ask why those times are different, you have to compile using optimization levels that are comparable.

2 Likes

Yes ,ifort will do loop-interchange but gfortran not.
It said it can ,but actaully not.

-floop-interchange

by the way, do concurrent can do it.

1 Like