I have updated https://github.com/vmagnin/exploring_coarrays with the xoroshiro128+ RNG (the previous version of the project is now in the “random_number” branch).
Globally, it has greatly improved the performances and has fixed the ifort problem with the OpenMP version (but not so much concerning ifx).
I have added a benchmark.sh
script to launch automatically 10x all the versions, and compute the mean times values.
Results
Intel(R) Core™ i7-5500U CPU @ 2.40GHz, under Ubuntu 20.10
Optimization flag: -O3
CPU time in seconds with 2 images/threads (except of course Serial):
Version | gfortran | ifort | ifx |
---|---|---|---|
Serial | 10.77 | 18.77 | 14.66 |
OpenMP | 5.75 | 9.32 | 60.30 |
Coarrays | 13.21 | 9.79 | |
Coarrays steady | 21.80 | 27.83 | |
Co_sum | 5.58 | 9.98 | |
Co_sum steady | 9.18 | 12.71 |
With 4 images/threads (except of course Serial):
Version | gfortran | ifort | ifx |
---|---|---|---|
Serial | 10.77 | 18.77 | 14.66 |
OpenMP | 4.36 | 8.42 | 43.21 |
Coarrays | 9.47 | 9.12 | |
Coarrays steady | 19.41 | 24.78 | |
Co_sum | 4.16 | 9.29 | |
Co_sum steady | 8.18 | 10.94 |
Further optimization
With gfortran, the -flto
(standard link-time optimizer) compilation option has a strong effect on this algorithm: for example, with the co_sum
version the CPU time with 4 images falls from 4.16 s to 2.38 s!