Gfortran with do concurrent for windows 10

For gfortran (Windows OS) which compiler flag or options should be used to enforce do concurrent? I find it is /Qparallel for intel and should be used with O2 or O3. It is not active by default there.

Is it -O3 for gfortran ( Windows 10 )?

1 Like

This previous thread has some info, but not sure about gfortran + windows… (I myself have no experience with do concurrent yet :biking_man: )

1 Like

I got the best performance using -O3 together with -march=native. Have not gone deeper yet but it looks like a good area.

Sure, but these flags are not related to loop parallelization.

According to the other discussion thread, I would give a try to -ftree-parallelize-loops=N

1 Like

I do not know how DO CONCURRENT can be multi-threaded with Gfortran on Windows !

I am using Gfortran 11.1.0 on Windows 10, but do not get parallel operation with DO CONCURRENT.

The following program example does a simple use of MATMUL in an !$OMP PARALLEL DO and a DO CONCURRENT loop.
I compiled with and without -fopenmp

By reporting both elapsed time and CPU time, it clearly shows if multiple threads are being used from the ratio of CPU time to Elapsed (system_clock), expressed as %

I am not an experienced user of DO CONCURRENT, to know if it can be mult-threaded or just the next generation of FORALL.

I am sceptical on the use of auto-parallelizer, as with !$OMP I recommend explicit use of SHARED and PRIVATE with DEFAULT(NONE)

!  The following started out as a short example, but expanced with more documentation. 
! test of do concurrent to check operation
   real*8, allocatable :: A(:,:,:), C(:,:,:)
   integer :: n = 1000, P=50, i
   real*8  :: times(2)

   call report_compiler_settings (6) 

   call delta_time ('START', times)
   allocate ( A(n,n,P), C(n,n,P) )

   call random_number ( a )
   call delta_time ('INITIALISE', times)

   do concurrent ( i = 1:P )
     c(:,:,i) = MATMUL ( A(:,:,i), A(:,:,i) )
   end do
   call delta_time ('Do concurrent MATMUL', times)

   !$OMP PARALLEL DO SHARED ( A,C,P )
   do i = 1,P
     c(:,:,i) = MATMUL ( A(:,:,i), A(:,:,i) )
   end do
   !$OMP END PARALLEL DO
   call delta_time ('OMP DO MATMUL', times)

   End

   subroutine delta_time ( desc, times )
     character desc*(*)
     real*8  :: times(2)

     real*8    :: last_cpu  = 0, cpu, per
     integer*8 :: last_tick = -1, tick, rate
     
     call system_clock ( tick, rate )
     call cpu_time (cpu)
     if ( last_tick < 0 ) then
       last_tick = tick
       last_cpu  = cpu
     end if

     times(1)  = dble(tick-last_tick) / dble (rate)
     times(2)  = cpu-last_cpu
     per       = times(2)/max (times(1),1.d-6)*100.
     last_tick = tick
     last_cpu  = cpu
     write (*,11) desc, times, per
 11  format (a,2f10.4,f10.4,'%')
   end subroutine delta_time
   
   subroutine report_compiler_settings (lu) 
 use iso_fortran_env 
! 
     character*128 run_date_string 
     character*128 command_string 
     integer*4 lu 
! 
!  build time variables ( when compiled ) 
! 
     write (lu,*) ' ' 
     write (lu,*) '==== Build Information ========== ' 
     write (lu,*) 'Compiler Options    : ', compiler_options () 
     write (lu,*) 'Compiler Version    : ', compiler_version () 
! 
!  run time variables ( when run ) 
! 
     call date_and_time_string     (run_date_string) 
     call get_command              (command_string) 
! 
     write (lu,*)  ' ' 
     write (lu,*)  '==== Run Time Information ========== ' 
     write (lu,11) ' Run Date',           trim(run_date_string) 
     write (lu,11) ' Run Command',        trim(command_string) 
! 
  11 format (a,t22,': ',a)
   end subroutine report_compiler_settings
  
   subroutine date_and_time_string (string) 
     character string*(*) 
     character :: months(12)*3 = (/ 'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec' /) 
     integer value(8) 
  
     call DATE_AND_TIME (values=value) 
     write (string,11) value(3),months(value(2)),value(1), value(5:8) 
  11 format ('Run on ',i0,'-',a,'-',i4,' at ',i0,':',i2.2,':',i2.2,'.',i3) 
   end subroutine date_and_time_string

My .bat file to test 2 compile options is:

del %1.exe
del %1.o

time /T >> %1.log

set vec=-march=native
set options=-O3 %vec% -ffast-math
 
gfortran %1.f90  %options% -o %1.exe
 
dir %1.* /od >> %1.log
echo ================================================================ >> %1.log
time /T     >> %1.log
set options >> %1.log

%1 >> %1.log

set options=-O3 %vec% -ffast-math  -fopenmp
 
gfortran %1.f90  %options% -o %1.exe
 
dir %1.* /od >> %1.log
echo ================================================================ >> %1.log
time /T     >> %1.log
set options >> %1.log

%1 >> %1.log

notepad %1.log

I also tried options=-O3 -ffast-math -ftree-parallelize-loops=8, with no apparent change ?

2 Likes

I am trying it but do not understand what is N here and how it should be adjusted.

N is supposed to be the number of threads used for the parallelization.

1 Like

Here is my code:

( I am trying to understand normal nested loop vs do concurrent here )

main.f90 (1.8 KB)

For do concurrent

  1. The first command I use:

gfortran main.f90

gives output time

41.953 seconds.

  1. The next command:

gfortran main.f90 -ftree-parallelize-loops=8

gives output time

41.594 seconds.

  1. The next command:

gfortran main.f90 -O3 -march=native

gives output time

7.188 seconds.

Note: gcc version is 11.3.0

For normal do loop

  1. The first command I use:

gfortran main.f90

gives output time

57.859 seconds.

  1. The next command:

gfortran main.f90 -O3 -march=native

gives output time

20.391 seconds.

So the question remains: How is do concurrent functioning here? Why the performance is better when using do concurrent (without enforcing parallel instructions )than a normal loop?

Thanks for sharing the files. I did some testing

1) with OpenMP

2) Without OpenMP

3) Enforcing parallel

It seems like that.

1 Like