Asking for suggestions about books on modern Fortran

@milancurcic does your book cover also gpu parallelization, openAcc etc? Thanks!

1 Like

No, my book only covers parallel concepts from standard Fortran, so coarrays, collectives, teams, and events.

2 Likes

Maybe the parallel execution of pi with do concurrent and OpenMP is also very relevant here. Previously there was no support for the reduction clause for the do concurrent. But the latest intel compiler has its full support.

To compare, I use the following codes and run them on Windows 10 with the options:

ifort main.f90 /Qopenmp
ifx main.f90 /Qopenmp

The timing i got is

dc openmp
ifort 0.083 0.084
ifx 0.089 0.081

Here is the

  • do concurrent code
program dc_pi_test
  USE, INTRINSIC :: ISO_FORTRAN_ENV
  use omp_lib
  implicit none

  INTEGER, PARAMETER :: dp = real64
  INTEGER, PARAMETER :: sp = int32


  integer (sp):: i, n
  real (dp)   :: width, partial_pi, dc_pi, x
  real (dp)   :: time_start, time_end


  time_start = omp_get_wtime()


  n = 1000000000
  partial_pi = 0.0_dp
  width = 1.0_dp/n


  do concurrent (i = 1:n) local(x) shared(width) reduce (+: partial_pi)
     x = width*(real(i,dp) - 0.5_dp)
     partial_pi = partial_pi + f(x)
  end do


  dc_pi = width*partial_pi


  time_end = omp_get_wtime()

  print*, ' time = ', time_end - time_start
  print*, ' intervals   = ', n
  print*, dc_pi


contains

  pure real (dp) function f(x)
    implicit none
    real (dp), intent (in) :: x

    f = 4.0_dp/(1.0_dp + x*x)
  end function f


end program dc_pi_test
  • the openmp code
program openmp_pi_test
  USE, INTRINSIC :: ISO_FORTRAN_ENV
  use omp_lib
  implicit none

  INTEGER, PARAMETER :: dp = real64
  INTEGER, PARAMETER :: sp = int32


  integer (sp):: i, n
  real (dp)   :: width, partial_pi, openmp_pi, x, nthreads
  real (dp)   :: time_start, time_end


  nthreads =  omp_get_max_threads()
  print *, ' Maximum number of threads is ', nthreads


  time_start = omp_get_wtime()

  n = 1000000000
  partial_pi = 0.0_dp
  width = 1.0_dp/n


  !$omp parallel do private(x) &
  !$omp shared(width) reduction(+:partial_pi)   

  do i = 1, n
     x = width*(real(i,dp) - 0.5_dp)
     partial_pi = partial_pi + f(x)
  end do

  !$omp end parallel do

  openmp_pi = width*partial_pi


  time_end = omp_get_wtime()

  print*, ' time = ', time_end - time_start
  print*, ' intervals   = ', n
  print*, openmp_pi



contains

  pure real (dp) function f(x)
    implicit none
    real (dp), intent (in) :: x

    f = 4.0_dp/(1.0_dp + x*x)
  end function f



end program openmp_pi_test
2 Likes

@Shahid interesting example, thanks for sharing this! The message, if I interpret correctly, is that there is no significant performance difference between openMp and do concurrent (at least for this particular example)

Yeah. At least for that example, I did not find any performance difference. There is an interesting article about the same approach with no performance reduction.

There is a lot of interesting discussions and announcements on the intel Forum.

As to my understanding, do concurrent is ISO Fortran standard, and OpenMP is an API directive-based approach. Intel is pushing hard to make the standard Fortran at par with alternatives like OpenMP.

1 Like

I never use coarray before, but that sounds like an array that can be share across different processes in a parallel program. But your comments suggest that it is not yet widely supported by Fortran compilers, correct? In which situations do you expect it to not work?

I have never used coarrays myself so far. I am probably not the indicated person to reply your question.

Does the book make a clear distinction about which features are from F2023? If possible for the moment I would like to stick to the previous Fortran standard (F2018?) because 2023 is too new for the compilers available on my university cluster.

Sorry, I unintentionally quoted your comment.

1 Like

Coarrays have been around since the mid to late 1990’s in Cray’s compilers. As far as I know only Intel, gfortran (with openCoarrays libraries), and NAG (shared memory) support them on current 64 bit Intel and AMD chips. Maybe on non-Apple ARM. For large scale Cray systems that have the hardware to support PGAS across 10s of thousands of cores they will run as fast (and in some case faster) than MPI 2. However, MPI 3 with one-sided communication evened the playing field so the only real advantage to Co-arrays is one of syntax and ease of use. My attempts to use them on workstations have been highly disappointing. I think only Intel supports most of the 2018 standard extensions. Even then there are issues with stability and things just not working as they should. Note that Intel and gfortran with openCoarrays use MPI as the transport layer so they are for the most part just an easier to use front end to MPI. I believe NAG uses a native shared memory implementation that should work better on workstations without the specialized hardware found on Crays. As with any new feature in Fortran, how well they perform will depend on your application and how you implement them.

4 Likes