Gfortran GPU offload with OpenMP

imorozov · September 24, 2020, 4:23am

Hi,

Do I understand correctly that OpenMP allows to offload computations to GPU and gfortran supports this?
I’m looking for a minimal example of offloading a simple !$OMP PARALLEL DO construct to GPU.

Or, what can be used to generate GPU FORTRAN programs?
I have some experience with OpenACC and pgfortran (no luck on gfortran).

vmagnin · September 24, 2020, 12:05pm

I didn’t know OpenMP could also be used on GPU.
I have found that presentation:

JerryD · September 25, 2020, 3:17am

Use compiler flag -fopenmp to turn on this feature. Here is an example from the gcc/gfortran testsuite:

subroutine bar (i, j, k, s, a)
integer :: i, j, k, s, a(100), l
!$omp parallel do schedule (guided, s * 2)
do 100, l = j, k
100 a(l) = i
!$omp parallel do schedule (guided, s * 2)
do 101, l = j, k, 3
101 a(l) = i + 1
end subroutine bar

This is not the best example. Show us the simple loop(s) you want to parrallelize and we might be able to help you further.

imorozov · September 26, 2020, 6:54am

A simple OpenMP loop is not a problem, but I’m failing to find a good reference/example of OpenMP and GPU interaction. I’ve seen the presentation linked by @vmagnin, but didn’t manage to build working simple example.

Looks like pgfortran is now morphed into nvfortran and the code below can be executed on GPU with OpenACC, but it requires adding !$acc routine info to all procedures.

! OpenMP (cpu)
! gfortran -o test -fopenmp -O0 test.f90
! OpenACC (gpu)
! nvfortran -o test -acc -O0 -ta=nvidia test.f90
module data
  use, intrinsic :: iso_fortran_env, only : rk => real64
  implicit none
  integer, parameter :: loop = 10**6
  contains
  real(rk) function task(var)
    !$acc routine seq
    real(rk), intent(in) :: var
    integer :: i
    do i = 1, loop, 1
      task = exp(real(i,rk))
    end do
    task = var
  end function task
end module
program test
  use, intrinsic :: iso_fortran_env, only : rk => real64
  use data
  implicit none
  integer, parameter :: length = 2**10
  real(rk), dimension(length) :: input
  real(rk), dimension(length) :: output
  integer :: i
  input = real([(i, i=1,length,1)], rk)
  !$omp parallel do
  !$acc parallel loop copyin(input) copyout(output)
  do i = 1, length
    output(i) = task(input(i))
  end do
  !$acc end parallel loop
  !$omp end parallel do
  do i = 1, length
    write(*,*) output(i)
  end do
end program test

fluidnumerics_joe · October 1, 2020, 12:57pm

Check out AMDs AOMP repository. The AOMP package provides clang and flang compilers that will handle GPU offloading. The repository also provides examples for both C and Fortran. In short, you’ll need to wrap your do loop with Target directives. Further, to distribute many GPU threads in parallel, you’ll want to use

!$OMP teams distribute parallel for

For example,

!$OMP target
!$OMP teams distribute parallel for num_threads(256)
DO i = 1, N
a(i) = b(i) + c(i)
ENDDO

!$OMP end target

If you prefer to use gfortran, you’ll need to build the compiler from source. For Nvidia platforms, make sure you have CUDA toolkit installed before building. For AMD, have rocm-dev installed. I’ve haven’t built gfortran for OpenMP offloading since v8, but it looks like GNU has some guidance on this :

fortran4r · November 26, 2022, 2:29pm

Any updates on this? I got a new version (12.2.1) of GFortran today.

fluidnumerics_joe · August 23, 2023, 8:42pm

Better late than never, I suppose… With GNU compilers, you will need to build them with offloading support : Offloading - GCC Wiki

I haven’t tried it yet, but you may be able to install GNU compilers with OpenMP (and OpenACC offloading) to Nvidia GPUs using the +nvptx variant, e.g.

spack install gcc@12.3.0 +nvptx

Topic		Replies	Views
Fortran Programmers : How do you want to offload to GPU accelerators in the next 5 years? Poll	21	5177	February 10, 2021
GPU offloading in Fortran Advocacy	37	1097	February 1, 2025
"Portable" GPU support, opinion wanted Help	7	286	August 15, 2024
Fortran projects running on GPUs in production Help	16	2550	June 7, 2023
Portability of Fortran's `do concurrent' on GPUs Compilers	1	309	August 18, 2024

Gfortran GPU offload with OpenMP

Related topics