Using coarrays with two different compilers: Speed measured by code and by operative system

Let’s compare two different vendors

This is the code: compute pi (It is not my code)

Machine

  • Intel Core i5 10th gen, 16GB RAM.
  • GNU Fortran (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0.
  • ifort version 2021.1

Let’s compile

$ caf -ffree-form -fcoarray=lib -O3 coarrays.f -o coarray_gfort
$ ifort -free -coarray -O3 coarrays.f -o coarray_ifort

Run the program

GFortran results

Let’s run it several times to find out a patron (this is just one of many results)

$ time cafrun -np 8 ./coarray_gfort


number of Fortran coarray images:  8
 approximating pi in      2000000  steps.
 pi:   3.1415926535897931        iterated pi:    3.1395227456008259     
pi error 0.207E-02
Elapsed wall clock time  0.475E-02 seconds, using  8 images.

real	0m0,047s
user	0m0,106s
sys	0m0,096s

Intel Fortran results

$ time ./coarray_ifort



number of Fortran coarray images:  8
 approximating pi in      2000000  steps.
 pi:   3.14159265358979        iterated pi:    3.13952274560078     
pi error 0.207E-02
Elapsed wall clock time  0.224E-02 seconds, using  8 images.

real	0m0,081s
user	0m0,166s
sys	0m0,192s

My questions

Although elapsed time calculated by code is better with Intel Fortran, the operative system says the GFortran program ran faster. Is this ok for you? Am I doing something wrong? I am not able to interpret results.

3 Likes

I would recommend to increase the workload a bit, with timings in hundreds of second regime it might be hard to get meaningful values. Also, you want to avoid benchmarking the overhead in launching MPI (caf will most likely use OpenMPI and ifort will probably use IntelMPI here), which you might be observing in the timings from the OS.

Some blind tests to establish a baseline can help here, launching an MPI enabled binary with multiple processes has a certain overhead compared to a sequential binary and can be easily measured with and simple empty Fortran program.

4 Likes

Another data point, in case it is interesting. Cray Fortran, Intel-broadwell target, program launch with Slurm (srun):

ftn test_pi.f90
srun -n8 -CBW28 ./a.out
number of Fortran coarray images: 8
approximating pi in 2000000 steps.
pi: 3.1415926535897931 iterated pi: 3.1395227456007966
pi error 0.207E-02
Elapsed wall clock time 0.172E-02 seconds, using 8 images.

1 Like