In my x86-64 Ubuntu system ifort -O3 -xHost was about 25% faster than gfortran -O3 -march=native with one program that spent most of its time calculating a double precision intrinsic but about 25% slower with another.
Both programs started with x=1; one then went 10**9 times round a loop containing x = atan(x) , the other went 10**9 times around x = log(1d0+x). Of course we were not told what Shahid’s program was doing. Integer or single or quad precision arithmetic, or non-numerical work, may give quite different run-time differences.
Since we don’t have access to the source code we can’t make any specific comments as to the performance difference.
As a general comment, the two compilers are developed by two very different teams with vastly different funding sources; ifort/ifx by Intel’s engineers and gfortran mostly by open-source volunteer contributors. The really impressive part is how gfortran, a completely free and open-source compiler, is able to complete so well with a compiler that up until a couple years ago was a paid, high-end product (parallel studio XE).
...
do istep = 1, nsteps ! 1000
do i = 1, Nx ! 512
do j = 1, Ny ! 512
dfde1(i,j) = -3.5*e1(i,j) + 1.5*e1(i,j)**3 + &
&2*eta1(i,j)*( e2(i,j)**2 + e3(i,j)**2 + e4(i,j)**2 )
.
. ! same type of calculation for 50 times
.
jp = j + 1
jm = j - 1
ip = i + 1
im = i - 1
if ( im == 0 ) im = Nx
if ( ip == ( Nx + 1) ) ip = 1
if ( jm == 0 ) jm = Ny
if ( jp == ( Ny + 1) ) jp = 1
lap_e1(i,j) = ( e1(ip,j) + e1(im,j) + e1(i,jm) + e1(i,jp) - &
4.0*e1(i,j) ) / ( dx*dy )
.
. ! for 50 terms
.
e1(i,j) = e1(i,j) - 0.5*0.6*( dfde1(i,j) - 0.56*lap_e1(i,j) )
.
. ! for 50 terms
.
end do
end do
end do
Note, that in the sample code you are accessing your 2D matrices in a non-contiguous manner, that will affect performance. Fortran is column-major, not row-major, so the loop order should be reversed.
It does not follow that the progress of the calculation will be the same or that the results will be the same or even comparable. Until you establish that the program is running correctly and the results are similar, it is not useful to compare run times.
You should note that the default for Intel is “optimize” whereas for Gfortran it is not. Before you compare run times and ask why those times are different, you have to compile using optimization levels that are comparable.