Learning coarrays, collective subroutines and other parallel features of Modern Fortran

I have updated https://github.com/vmagnin/exploring_coarrays with the xoroshiro128+ RNG (the previous version of the project is now in the “random_number” branch).

Globally, it has greatly improved the performances and has fixed the ifort problem with the OpenMP version (but not so much concerning ifx).

I have added a benchmark.sh script to launch automatically 10x all the versions, and compute the mean times values.

Results

Intel(R) Core™ i7-5500U CPU @ 2.40GHz, under Ubuntu 20.10
Optimization flag: -O3

CPU time in seconds with 2 images/threads (except of course Serial):

Version gfortran ifort ifx
Serial 10.77 18.77 14.66
OpenMP 5.75 9.32 60.30
Coarrays 13.21 9.79
Coarrays steady 21.80 27.83
Co_sum 5.58 9.98
Co_sum steady 9.18 12.71

With 4 images/threads (except of course Serial):

Version gfortran ifort ifx
Serial 10.77 18.77 14.66
OpenMP 4.36 8.42 43.21
Coarrays 9.47 9.12
Coarrays steady 19.41 24.78
Co_sum 4.16 9.29
Co_sum steady 8.18 10.94

Further optimization

With gfortran, the -flto (standard link-time optimizer) compilation option has a strong effect on this algorithm: for example, with the co_sum version the CPU time with 4 images falls from 4.16 s to 2.38 s!

5 Likes