For gfortran (Windows OS)
which compiler flag or options should be used to enforce do concurrent? I find it is /Qparallel
for intel and should be used with O2 or O3. It is not active by default there.
Is it -O3
for gfortran ( Windows 10 )?
For gfortran (Windows OS)
which compiler flag or options should be used to enforce do concurrent? I find it is /Qparallel
for intel and should be used with O2 or O3. It is not active by default there.
Is it -O3
for gfortran ( Windows 10 )?
This previous thread has some info, but not sure about gfortran + windows… (I myself have no experience with do concurrent yet )
I got the best performance using -O3
together with -march=native
. Have not gone deeper yet but it looks like a good area.
Sure, but these flags are not related to loop parallelization.
According to the other discussion thread, I would give a try to -ftree-parallelize-loops=N
I do not know how DO CONCURRENT can be multi-threaded with Gfortran on Windows !
I am using Gfortran 11.1.0 on Windows 10, but do not get parallel operation with DO CONCURRENT.
The following program example does a simple use of MATMUL in an !$OMP PARALLEL DO and a DO CONCURRENT loop.
I compiled with and without -fopenmp
By reporting both elapsed time and CPU time, it clearly shows if multiple threads are being used from the ratio of CPU time to Elapsed (system_clock), expressed as %
I am not an experienced user of DO CONCURRENT, to know if it can be mult-threaded or just the next generation of FORALL.
I am sceptical on the use of auto-parallelizer, as with !$OMP I recommend explicit use of SHARED and PRIVATE with DEFAULT(NONE)
! The following started out as a short example, but expanced with more documentation.
! test of do concurrent to check operation
real*8, allocatable :: A(:,:,:), C(:,:,:)
integer :: n = 1000, P=50, i
real*8 :: times(2)
call report_compiler_settings (6)
call delta_time ('START', times)
allocate ( A(n,n,P), C(n,n,P) )
call random_number ( a )
call delta_time ('INITIALISE', times)
do concurrent ( i = 1:P )
c(:,:,i) = MATMUL ( A(:,:,i), A(:,:,i) )
end do
call delta_time ('Do concurrent MATMUL', times)
!$OMP PARALLEL DO SHARED ( A,C,P )
do i = 1,P
c(:,:,i) = MATMUL ( A(:,:,i), A(:,:,i) )
end do
!$OMP END PARALLEL DO
call delta_time ('OMP DO MATMUL', times)
End
subroutine delta_time ( desc, times )
character desc*(*)
real*8 :: times(2)
real*8 :: last_cpu = 0, cpu, per
integer*8 :: last_tick = -1, tick, rate
call system_clock ( tick, rate )
call cpu_time (cpu)
if ( last_tick < 0 ) then
last_tick = tick
last_cpu = cpu
end if
times(1) = dble(tick-last_tick) / dble (rate)
times(2) = cpu-last_cpu
per = times(2)/max (times(1),1.d-6)*100.
last_tick = tick
last_cpu = cpu
write (*,11) desc, times, per
11 format (a,2f10.4,f10.4,'%')
end subroutine delta_time
subroutine report_compiler_settings (lu)
use iso_fortran_env
!
character*128 run_date_string
character*128 command_string
integer*4 lu
!
! build time variables ( when compiled )
!
write (lu,*) ' '
write (lu,*) '==== Build Information ========== '
write (lu,*) 'Compiler Options : ', compiler_options ()
write (lu,*) 'Compiler Version : ', compiler_version ()
!
! run time variables ( when run )
!
call date_and_time_string (run_date_string)
call get_command (command_string)
!
write (lu,*) ' '
write (lu,*) '==== Run Time Information ========== '
write (lu,11) ' Run Date', trim(run_date_string)
write (lu,11) ' Run Command', trim(command_string)
!
11 format (a,t22,': ',a)
end subroutine report_compiler_settings
subroutine date_and_time_string (string)
character string*(*)
character :: months(12)*3 = (/ 'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec' /)
integer value(8)
call DATE_AND_TIME (values=value)
write (string,11) value(3),months(value(2)),value(1), value(5:8)
11 format ('Run on ',i0,'-',a,'-',i4,' at ',i0,':',i2.2,':',i2.2,'.',i3)
end subroutine date_and_time_string
My .bat file to test 2 compile options is:
del %1.exe
del %1.o
time /T >> %1.log
set vec=-march=native
set options=-O3 %vec% -ffast-math
gfortran %1.f90 %options% -o %1.exe
dir %1.* /od >> %1.log
echo ================================================================ >> %1.log
time /T >> %1.log
set options >> %1.log
%1 >> %1.log
set options=-O3 %vec% -ffast-math -fopenmp
gfortran %1.f90 %options% -o %1.exe
dir %1.* /od >> %1.log
echo ================================================================ >> %1.log
time /T >> %1.log
set options >> %1.log
%1 >> %1.log
notepad %1.log
I also tried options=-O3 -ffast-math -ftree-parallelize-loops=8, with no apparent change ?
I am trying it but do not understand what is N
here and how it should be adjusted.
N
is supposed to be the number of threads used for the parallelization.
Here is my code:
( I am trying to understand normal nested loop vs do concurrent here )
main.f90 (1.8 KB)
For do concurrent
gfortran main.f90
gives output time
41.953 seconds.
gfortran main.f90 -ftree-parallelize-loops=8
gives output time
41.594 seconds.
gfortran main.f90 -O3 -march=native
gives output time
7.188 seconds.
Note: gcc version is 11.3.0
For normal do loop
gfortran main.f90
gives output time
57.859 seconds.
gfortran main.f90 -O3 -march=native
gives output time
20.391 seconds.
So the question remains: How is do concurrent functioning here? Why the performance is better when using do concurrent (without enforcing parallel instructions )than a normal loop?