I am trying to run a basic example of parallel computing on the GPU with OpenMP. I have a Windows OS and I compile using ifx/ifort.
The code is the following:
program matrix_multiply
use omp_lib
implicit none
integer :: i, j, k, myid, m, n
real, allocatable, dimension(:,:) :: a, b, c, c_serial
!
! Different Intel GPUs have varying amounts of memory. If the program
! fails at runtime, try decreasing the value of "n".
!
n = 2600
myid = OMP_GET_THREAD_NUM()
if (myid .eq. 0) then
print *, 'matrix size ', n
print *, 'Number of CPU procs is ', OMP_GET_NUM_THREADS()
print *, 'Number of OpenMP Device Available:', omp_get_num_devices()
!$omp target
if (OMP_IS_INITIAL_DEVICE()) then
print *, ' Running on CPU'
else
print *, ' Running on GPU'
endif
!$omp end target
endif
allocate( a(n,n), b(n,n), c(n,n), c_serial(n,n))
! Initialize matrices
do j=1,n
do i=1,n
a(i,j) = i + j - 1
b(i,j) = i - j + 1
enddo
enddo
c = 0.0
c_serial = 0.0
!$omp target teams map(to: a, b) map(tofrom: c)
!$omp distribute parallel do SIMD private(j, i, k)
! parallel compute matrix multiplication.
do j=1,n
do i=1,n
do k=1,n
c(i,j) = c(i,j) + a(i,k) * b(k,j)
enddo
enddo
enddo
!$omp end target teams
! serial compute matrix multiplication
do j=1,n
do i=1,n
do k=1,n
c_serial(i,j) = c_serial(i,j) + a(i,k) * b(k,j)
enddo
enddo
enddo
! verify result
do j=1,n
do i=1,n
if (c_serial(i,j) .ne. c(i,j)) then
print *,'FAILED, i, j, c_serial(i,j), c(i,j) ', i, j, c_serial(i,j), c(i,j)
exit
endif
enddo
enddo
print *,'PASSED'
end program matrix_multiply
I compile using the following makefile
# Select Compiler
COMPILER = ifx
SWITCH = -QxHost /Qopenmp -fopenmp-targets=spir64
#SWITCH = /Qmkl /Qopenmp /warn:all /check:all /traceback /heap-arrays0
#GARBAGE = /fast /Qparallel /Qipo /Qprec-div- /QxHost /heap-arrays0
SRCS = src\03_mm_GPU.f90
EXEC = exe\run_win.exe
ifort:
$(COMPILER) -fpp $(SWITCH) $(SRCS) -o $(EXEC)
# Cleaning everything
clean:
del *.mod
del *.obj
del *.pdb
del *.ilk
del $(EXEC)
#To compile in Mac, type:
# $ make -f makefile_mac
#To compile in Windows, type:
# $ nmake /f makefile_win
# option flag /heap-arrays0
# to store all arrays on the heap
# see https://community.intel.com/t5/Intel-Fortran-Compiler/allocatable-automatic-stack-heap/m-p/1229091#M152713
The source for this example is from “guided_matrix_mul_OpenMP” which was freely available on the web some time ago (I wasn’t able to find it again online). I adopted it with minor modifications.
When I compile the code (with ifx or with ifort, it doesn’t matter), I get the following warning:
ifx -fpp -QxHost /Qopenmp -fopenmp-targets=spir64 src\03_mm_GPU.f90 -o exe\run_win.exe
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2022.2.0 Build 20220730
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.
ifx: command line warning #10006: ignoring unknown option '/fopenmp-targets=spir64'
Then the code runs with the following output on the screen:
matrix size 2600
Number of CPU procs is 1
Number of OpenMP Device Available: 0
Running on CPU
PASSED
but it does not do any parallelization and it ignores the !$omp target
directives. Since I’m new to this type of parallelization (I am familiar with OpenMP on the cpu only), any help would be greatly appreciated!