@milancurcic does your book cover also gpu parallelization, openAcc etc? Thanks!
No, my book only covers parallel concepts from standard Fortran, so coarrays, collectives, teams, and events.
Maybe the parallel execution of pi with do concurrent and OpenMP is also very relevant here. Previously there was no support for the reduction clause for the do concurrent. But the latest intel compiler has its full support.
To compare, I use the following codes and run them on Windows 10 with the options:
ifort main.f90 /Qopenmp
ifx main.f90 /Qopenmp
The timing i got is
dc | openmp | |
---|---|---|
ifort | 0.083 | 0.084 |
ifx | 0.089 | 0.081 |
Here is the
- do concurrent code
program dc_pi_test
USE, INTRINSIC :: ISO_FORTRAN_ENV
use omp_lib
implicit none
INTEGER, PARAMETER :: dp = real64
INTEGER, PARAMETER :: sp = int32
integer (sp):: i, n
real (dp) :: width, partial_pi, dc_pi, x
real (dp) :: time_start, time_end
time_start = omp_get_wtime()
n = 1000000000
partial_pi = 0.0_dp
width = 1.0_dp/n
do concurrent (i = 1:n) local(x) shared(width) reduce (+: partial_pi)
x = width*(real(i,dp) - 0.5_dp)
partial_pi = partial_pi + f(x)
end do
dc_pi = width*partial_pi
time_end = omp_get_wtime()
print*, ' time = ', time_end - time_start
print*, ' intervals = ', n
print*, dc_pi
contains
pure real (dp) function f(x)
implicit none
real (dp), intent (in) :: x
f = 4.0_dp/(1.0_dp + x*x)
end function f
end program dc_pi_test
- the openmp code
program openmp_pi_test
USE, INTRINSIC :: ISO_FORTRAN_ENV
use omp_lib
implicit none
INTEGER, PARAMETER :: dp = real64
INTEGER, PARAMETER :: sp = int32
integer (sp):: i, n
real (dp) :: width, partial_pi, openmp_pi, x, nthreads
real (dp) :: time_start, time_end
nthreads = omp_get_max_threads()
print *, ' Maximum number of threads is ', nthreads
time_start = omp_get_wtime()
n = 1000000000
partial_pi = 0.0_dp
width = 1.0_dp/n
!$omp parallel do private(x) &
!$omp shared(width) reduction(+:partial_pi)
do i = 1, n
x = width*(real(i,dp) - 0.5_dp)
partial_pi = partial_pi + f(x)
end do
!$omp end parallel do
openmp_pi = width*partial_pi
time_end = omp_get_wtime()
print*, ' time = ', time_end - time_start
print*, ' intervals = ', n
print*, openmp_pi
contains
pure real (dp) function f(x)
implicit none
real (dp), intent (in) :: x
f = 4.0_dp/(1.0_dp + x*x)
end function f
end program openmp_pi_test
@Shahid interesting example, thanks for sharing this! The message, if I interpret correctly, is that there is no significant performance difference between openMp and do concurrent (at least for this particular example)
Yeah. At least for that example, I did not find any performance difference. There is an interesting article about the same approach with no performance reduction.
There is a lot of interesting discussions and announcements on the intel Forum.
As to my understanding, do concurrent is ISO Fortran standard, and OpenMP is an API directive-based approach. Intel is pushing hard to make the standard Fortran at par with alternatives like OpenMP.
I never use coarray before, but that sounds like an array that can be share across different processes in a parallel program. But your comments suggest that it is not yet widely supported by Fortran compilers, correct? In which situations do you expect it to not work?
I have never used coarrays myself so far. I am probably not the indicated person to reply your question.
Does the book make a clear distinction about which features are from F2023? If possible for the moment I would like to stick to the previous Fortran standard (F2018?) because 2023 is too new for the compilers available on my university cluster.
Sorry, I unintentionally quoted your comment.
Coarrays have been around since the mid to late 1990’s in Cray’s compilers. As far as I know only Intel, gfortran (with openCoarrays libraries), and NAG (shared memory) support them on current 64 bit Intel and AMD chips. Maybe on non-Apple ARM. For large scale Cray systems that have the hardware to support PGAS across 10s of thousands of cores they will run as fast (and in some case faster) than MPI 2. However, MPI 3 with one-sided communication evened the playing field so the only real advantage to Co-arrays is one of syntax and ease of use. My attempts to use them on workstations have been highly disappointing. I think only Intel supports most of the 2018 standard extensions. Even then there are issues with stability and things just not working as they should. Note that Intel and gfortran with openCoarrays use MPI as the transport layer so they are for the most part just an easier to use front end to MPI. I believe NAG uses a native shared memory implementation that should work better on workstations without the specialized hardware found on Crays. As with any new feature in Fortran, how well they perform will depend on your application and how you implement them.