Do concurrent: can loop variable be specified in `local`?

Can the loop variable j be specified in the local specifier of do concurrent in the code below?

program do_concurrent_13
implicit none
integer, parameter :: Nx = 600, Ny = 450, n_max = 255, dp=kind(0.d0)
real(dp), parameter :: xcenter = -0.5_dp, ycenter = 0.0_dp, &
    width = 4, height = 3, dx_di = width/Nx, dy_dj = -height/Ny, &
    x_offset = xcenter - (Nx+1)*dx_di/2, y_offset = ycenter - (Ny+1)*dy_dj/2
real(dp) :: x, y, x_0, y_0, x_sqr, y_sqr, wtime
integer :: i, j, n, image(Nx, Ny)
do concurrent (j = 1:Ny) shared(image) local(i, j, x, y, x_0, y_0, x_sqr, y_sqr, n)
    y_0 = y_offset + dy_dj * j
    do i = 1, Nx
        x_0 = x_offset + dx_di * i
        x = 0; y = 0; n = 0
        do
            x_sqr = x ** 2; y_sqr = y ** 2
            if (x_sqr + y_sqr > 4 .or. n == n_max) then
                image(i,j) = 255-n
                exit
            end if
            y = y_0 + 2 * x * y
            x = x_0 + x_sqr - y_sqr
            n = n + 1
        end do
    end do
end do
print *, sum(image)
if ( sum(image) /= 59157126 ) error stop
end program

Here is a link to compiler explorer: Compiler Explorer

Ifx allows it, LFortran allows it, Flang does not allow j to be specified in local and GFortran does not compile the do concurrent at all.

1 Like

Page 197 of J3/24-007:

C1127 A variable-name in a locality-spec shall not be the same as an index-name in the concurrent-header of the same DO CONCURRENT statement.

where locality-spec includes local(...), local_init(...), reduce(...), shared(...).

1 Like

Awesome, thanks @ivanpribec! I am guessing index-name is what I call the loop variable. I am guessing do concurrent loop variables are automatically “local”?

1 Like

Yes, index name is the loop iteration variable as the part of index-name = lb : ub : stride.

In contrast OpenMP 5.2 (section 5.1.1) allows the loop iteration variable to be in a private clause, but it is not necessary (it is redundant):

  • The loop iteration variable in any associated loop of a loop-associated construct may be listed in a private or lastprivate clause.

By default the iteration variables are private:

  • Loop iteration variables inside parallel, teams, or task generating constructs are private in the innermost such construct that encloses the loop.
  • Implied-do, FORALL and DO CONCURRENT indices are private.

I believe Intel Fortran maps the parallel execution of do concurrent to a corresponding OpenMP construct internally.

1 Like

Excellent, thanks @ivanpribec, I think this clarifies it. So the index-name of the kind lb : ub : stride means lb, ub, and stride are implicitly local, and should not be marked explicitly local by the user.

Maybe I phrased that incorrectly, I don’t think the bounds are necessarily local. The whole set of rules can be found in document 007 provided at J3 Fortran - Standing Documents, pages 196 - 203.

The lb and ub are “concurrent limits”. They should also not be part of a LOCAL locality spec. It’s left a bit unclear, if they can be part of other locality specs.

  • A variable that is referenced by the scalar-mask-expr of a concurrent-header or by any concurrent-limit or concurrent-step in that concurrent-header shall not appear in a LOCAL locality-spec in the same DO CONCURRENT statement.

So I’m not sure about something odd like this:

lb = 0
ub = 100
do concurrent (i=lb:ub) reduce(+:lb)
  lb = lb + i
end do
1 Like

I think if default (none) clause is not used then it can be used as default private. Here in your code

  1. default private
do concurrent (j = 1:Ny) shared(image) local(i, j, x, y, x_0, y_0, x_sqr, y_sqr, n)

it is private by default. But in the second case it should be explicitly declared in the local

  1. default none
do concurrent (j = 1:Ny) default (none) shared(image) local(i, j, x, y, x_0, y_0, x_sqr, y_sqr, n)

This is a numbered constraint meaning that a standard-conforming Fortran processor (i.e. compiler) should be able to diagnose it (J3/24-007, page 32, paragraph 2):

A Fortran processor shall:
[…]
(3) contain the capability to detect and report the use within a submitted program unit of a form or relationship that is not permitted by the numbered syntax rules or constraints, […]

The line you’ve shown violates the constraint:

do concurrent (j = 1:Ny) default (none) shared(image) local(i, j, x, y, x_0, y_0, x_sqr, y_sqr, n)
!              ^                                               ^
!              index-name                                      variable name in locality spec

I do not see error or warning by intel in these cases.

Do compilers must always conform to the standards?

It’s important if you want to write portable programs. As you may know, the Intel Fortran compilers can’t be used on all hardware: Intel Fortran on Snapdragon chips

Imagine being a handyman and buying an expensive set of wrenches. But when you come to the job, the bolts don’t fit. Either you’ll need to buy/borrow a different set of wrenches, or use a different set of bolts. In both cases it will cause extra costs or delays.

1 Like

I did some modification just to try

program do_concurrent_13
implicit none
integer, parameter :: Nx = 600, Ny = 450, n_max = 255, dp=kind(0.d0)
real(dp), parameter :: xcenter = -0.5_dp, ycenter = 0.0_dp, &
    width = 4, height = 3, dx_di = width/Nx, dy_dj = -height/Ny, &
    x_offset = xcenter - (Nx+1)*dx_di/2, y_offset = ycenter - (Ny+1)*dy_dj/2
real(dp) :: x, y, x_0, y_0, x_sqr, y_sqr, wtime
integer :: i, j, n, image(Nx, Ny)

! first test default none

!do concurrent (j = 1:Ny) default (none) shared(image) local(i, j, x, y, x_0, y_0, x_sqr, y_sqr, n)
! second test 
do concurrent (j = 1:Ny) shared(image) local(i, x, y, x_0, y_0, x_sqr, y_sqr, n)

    y_0 = y_offset + dy_dj * j
    do i = 1, Nx
        x_0 = x_offset + dx_di * i
        x = 0; y = 0; n = 0
        do
            x_sqr = x ** 2; y_sqr = y ** 2
            if (x_sqr + y_sqr > 4 .or. n == n_max) then
                image(i,j) = 255-n
                exit
            end if
            y = y_0 + 2 * x * y
            x = x_0 + x_sqr - y_sqr
            n = n + 1
        end do
    end do
end do

print *, sum(image)
if ( sum(image) /= 59157126 ) error stop
end program
C:\Users\owner\Desktop>ifx test.f90 /Qopenmp /F2000000
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test.exe
-subsystem:console
-stack:2000000
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
test.obj

C:\Users\owner\Desktop>test
    59157126

C:\Users\owner\Desktop>ifx test.f90 /Qopenmp /F2000000
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test.exe
-subsystem:console
-stack:2000000
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
test.obj

C:\Users\owner\Desktop>test
    59157126

C:\Users\owner\Desktop>

As a user, I think of the loop variable j as being local to each thread, so I don’t understand why I can’t specify it in the local specifier. Does it not make sense to specify j as local?

Also: why is i allowed in local(), but not j, even though both are loop variables? Yes, j is a parallel loop variable, while i is serial, but in my mind they both seems almost the same. So it’s very confusing to me.

(It seems the standard says you shouldn’t, but we can propose to change the standard if it is not well designed, so I am not worried about it, rather I want to figure out what makes sense to do.)

C1127 A variable-name in a locality-spec shall not be the same as an index-name in the concurrent-header of the same DO CONCURRENT statement.

The way it was explained to me is that the index-name is a construct entity, not a variable for which it makes sense to specify a locality. I.e.

19.4 Statement and construct entities
…
A variable that appears as an index-name in a … DO CONCURRENT construct … is a construct entity.

3.35
construct entity
entity whose identifier has the scope of a construct

Note the implication here is that in the following example the i inside the loop is not the same as the i outside the loop.

integer :: i
i = 42
do concurrent (i = 1:2)
  print *, i
end do
print *, i ! this should still print 42
1 Like

I entered a bug report against ifx for this.

2 Likes

I did too for LFortran: do concurrent: do not allow parallel loop variables in local() · Issue #4568 · lfortran/lfortran · GitHub. Let’s see who fixes it first! :slight_smile:

Update (18h later): it’s fixed for LFortran.

2 Likes

(not sure if this is the best place for this comment, but could be?)
I have been looking at DO CONCURRENT and read 22-169.pdf from j3-fortran.org .

Unfortunately:

  1. I am wanting to understand what multi-threading support is available for DO CONCURRENT, but,
  2. I don’t know what support there is for the new features of DO CONCURRENT in available compilers and,
  3. The only Fortran compilers I am using are GFortran Ver 12 and Silverfrost FTN95

However I have enhanced a test example of DO CONCURRENT use from 22-169.pdf.
This now provides a comparison of 4 types of approaches:

  1. using pure functions (as in 22-169.pdf)
  2. using a conventional DO loop
  3. using do concurrent ( but not with recent thread based changes )
  4. using !$OMP parallel DO

I have created a repeat test summary approach (add_ticks) for 7 different array sizes wth 5 repeats of each test, which produces a report of average ticks per test, using SYSTEM_CLOCK ( which is best for Gfortran on windows)

I am hoping othres might be able to :
a) enhance these tests with an additional DO CONCURRENT test that includes the recent thread based changes in the Fortran standard,
b) provide some documentation of which Fortran compilers support multi-threaded do concurrent,
c) provide some results to assess the effectiveness of standard conforming multi-threaded DO CONCURRENT usage vs OpenMP usage.

It would be good to understand what contribution DO CO… may be providing for standard conforming calculation.

I would be interested in what others think of the effectiveness of DO CONCURRENT, how it could provide some standard conformance for multi-threading, or if other directions are required.
My testing has been limited to multi-threading, rather than distributed processing / COARRAYS, which is another significant direction for Fortran.

One of the problems that is emerging with using multi-threaing is the movement to off-loading to GPU’s, where these are requiring more non-standard and diverse approaches for each type of GPU hardware, such as for NVIDIA or Intel hardware with hardware specific compilers.

The following is my revised code which I built using Gfortran

!  Gfortran build : gfortran test1.f90 -O3 -fopenmp -o test1.exe

 module numerot
  integer, parameter :: num = 1000
  integer :: n
  real, allocatable :: A(:), B(:), C(:)

  real      :: dsec
  integer*8 :: last_tick=0, dtick

  integer, parameter :: mtype = 5
  integer, parameter :: mtest = 100
  integer :: ntype
  integer :: test_count(mtype), test_n(mtest,mtype), test_t(mtest,mtype), sum_t(mtype)

  contains

   pure real function yksi(X)
    implicit none
     real, intent(in) :: X(:)
      !real, intent(out) :: R
       yksi = norm2(X)
   end function yksi
   
   pure real function kaksi(X)
    implicit none
     real, intent(in) :: X(:)
      kaksi = 2*norm2(X)
   end function kaksi
   
   pure real function kolme(X)
    implicit none
     real, intent(in) :: X(:)
      kolme = 3*norm2(X)
   end function kolme
   
   real function delta_sec ()
    integer*8 :: tick, rate
    
     call SYSTEM_CLOCK ( tick, rate )
     dtick     = tick-last_tick
     last_tick = tick
     dsec      = real(dtick) / real(rate)
     delta_sec = dsec
   end function delta_sec
   
   subroutine add_ticks ( test, n )
!  routine to accumulate and report multiple tests

     integer :: test, n,  k,i, m, nt
     real :: x

     x = delta_sec ()
     if ( test == 0 ) then           ! initialise
       ntype      = n
       test_count = 0
       test_n     = 0

     else if ( test == -1 ) then     ! report averages
       write (*,10) 'pure', 'DO loop', 'DO Con', 'OpenMP'
       k = test_count(1)+1
       m = 0
       nt = 0
       sum_t = 0
       do i = 1,k
         if ( m /= test_n(i,1) ) then
           if ( nt > 0 ) write ( *,12 ) sum_t(1:ntype) / nt
           write (*,*) ' '
           nt = 0
           sum_t = 0
         end if
         if ( test_n(i,1) > 0 ) then
           nt = nt+1
           write ( *,11) nt, test_n(i,1), test_t(i,1:ntype)
           sum_t = sum_t + test_t(i,:)
           m = test_n(i,1)
         end if
       end do
   10  format ( 13x, 5A8 )
   11  format ( i3, 2i9, 5i8 )
   12  format ( 13x, 5i8 )
       
     else                            ! accumulate
       if ( test <= mtype ) then
         k = test_count(test) + 1
         if ( k <= mtest ) then
           test_count(test) = k
           test_n(k,test)   = n
           test_t(k,test)   = dtick
         end if
       end if
     end if

   end subroutine add_ticks
   
 end module numerot
  
 program main
  use numerot
  use iso_fortran_env
   implicit none

   integer i,j

    write (*,*) 'Vern : ',compiler_version ()
    write (*,*) 'Opts : ',compiler_options ()
  
   call add_ticks (0,4)
   
   do i = 1,7
     n = num * 4**i
     allocate ( a(n), b(n), c(n) )
     A = 1
     B = 1
     C = 1
     write ( *,*) 'Test n=',n

     do j = 1,5
       dsec = delta_sec ()
       call main_test
       call do_con_test
       call openmp_test
       call do_test
     end do
     deallocate ( a, b, c )
   end do
   call add_ticks (-1,0)
   
 end program main

  subroutine main_test
   use numerot
   implicit none
    real :: RA, RB, RC

    RA = yksi(A)
    RB = kaksi(B)
    RC = kolme(C)
    
    call add_ticks (1,n)
    print*,RA+RB+RC, dsec, dtick,' pure'

  end subroutine main_test
 
  subroutine do_con_test
   use numerot
   implicit none
    real :: RA, RB, RC
    integer i

    ra = 0
    rb = 0
    rc = 0

    do concurrent ( i = 1:size(A) )
      RA = RA + A(i)**2
      RB = RB + B(i)**2
      RC = RC + C(i)**2
    end do

    RA = sqrt (RA)
    RB = sqrt (RB) * 2
    RC = sqrt (RC) * 3
    
    call add_ticks (3,n)
    print*,RA+RB+RC, dsec, dtick,' do concurrent'

  end subroutine do_con_test

  subroutine openmp_test
   use numerot
   implicit none
    real :: RA, RB, RC
    integer i

    ra = 0
    rb = 0
    rc = 0

  !$OMP PARALLEL DO private (i) shared (A,B,C), REDUCTION (+: RA,RB,RC)
    do i = 1, size(A)
      RA = RA + A(i)**2
      RB = RB + B(i)**2
      RC = RC + C(i)**2
    end do
  !$OMP END PARALLEL DO

    RA = sqrt (RA)
    RB = sqrt (RB) * 2
    RC = sqrt (RC) * 3
    
    call add_ticks (4,n)
    print*,RA+RB+RC, dsec, dtick,' OpenMP'

  end subroutine openmp_test

  subroutine do_test
   use numerot
   implicit none
    real :: RA, RB, RC
    integer i

    ra = 0
    rb = 0
    rc = 0

    do i = 1, size(A)
      RA = RA + A(i)**2
      RB = RB + B(i)**2
      RC = RC + C(i)**2
    end do

    RA = sqrt (RA)
    RB = sqrt (RB) * 2
    RC = sqrt (RC) * 3
    
    call add_ticks (2,n)
    print*,RA+RB+RC, dsec, dtick,' DO test'

  end subroutine do_test
1 Like

Further to my last post,
I tried Gfortran 14.1.0 with a locality-spec, but it was not accepted.
My locality code is:

  subroutine do_local_test
   use numerot
   implicit none
    real :: RA, RB, RC
    integer i

    ra = 0
    rb = 0
    rc = 0

    do concurrent ( i = 1:size(A) ) REDUCE (+ : RA,RB,RC )
      RA = RA + A(i)**2
      RB = RB + B(i)**2
      RC = RC + C(i)**2
    end do

    RA = sqrt (RA)
    RB = sqrt (RB) * 2
    RC = sqrt (RC) * 3
    
    call add_ticks (5,n)
    print*,RA+RB+RC, dsec, dtick,' DO REDUCE'

  end subroutine do_local_test

Perhaps my compiler options were not compatible (with my standard conforming code ?)
Does anyone have experience with using locality-spec ?

1 Like

Locality specifiers have been added to the trunk of gfortran, just a couple weeks ago:

https://gcc.gnu.org/pipermail/fortran/2024-September/061082.html

1 Like

Check this post:

1 Like

I just downloaded the gfortran 14.1.0 ( Fortran, C, C++ for Windows (equation.com)) for windows 10.

C:\Users\owner\Desktop> gfortran --version
GNU Fortran (GCC) 14.1.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and still see the locality spec error.

10 | do concurrent ( i = 1:size(A) ) REDUCE (+ : RA,RB,RC )
| 1
Error: Syntax error in DO statement at (1)