Zero iteration DO loop and OMP SIMD behavior

Consider this example of an interesting interaction of OMP SIMD and zero iteration DO loop:

program foo
   implicit none
   integer(kind=4) :: i, j, me1, me2

   me1 = 1
   me2 = 1

   i = me1
   j = 42 !...dummy flag to test IV after loop exit

   !$omp simd
   do j=i+1, me2
     me1 = me1+1
   enddo

   print *, 'me1 ', me1
   print *, 'IV j should be 2, is: ', j

end program foo

WIthout OMP SIMD the behavior is predictable. Given the values before the DO of i=me1=1 and me2=1, the loop is
do j=2,1
so expected behavior is that the loop has no iterations, but the interation variable (IV) value after the loop is expected to be 2. Which it is for gfortran, ifort, ifx, and NAG. IFX has an oddity in that OMP SIMD is interpreted by default even without any OPENMP options (this was a behavior change that I strongly opposed but I lost the argument). SO for ifx you have to compile with an option to explicitly turn off SIMD. Here is that example

ifx -qno-openmp-simd ifx_simd_deadcode.f90 ; ./a.out
 me1            1
 IV j should be 2, is:            2

So far so good, all 4 compilers give the same result.
Now if we enable SIMD (Ifx default)
NAG doesn’t like the OMP SIMD syntax and won’t compile this example.
Gfortran 13 compiles but the value of the IV value after the loop is undefined

 rm a.out ; gfortran -fopenmp -O0 ifx_simd_deadcode.f90 ; ./a.out
 me1            1
 IV j should be 2, is:   1867453140

similarly, ifx with SIMD enabled does not show J = 2. Rather, it keeps the value of 42 assigned prior to the loop

rm a.out ; ifx  -qopenmp -O0 ifx_simd_deadcode.f90 ; ./a.out
 me1            1
 IV j should be 2, is:           42

I even tried adding J to a LASTPRIVATE clause

program foo
   implicit none
   integer(kind=4) :: i, j, me1, me2

   me1 = 1
   me2 = 1

   i = me1
   j = 42 !...dummy flag to test IV after loop exit

   !$omp simd lastprivate(j)
   do j=i+1, me2
     me1 = me1+1
   enddo

   print *, 'me1 ', me1
   print *, 'IV j should be 2, is: ', j

end program foo

but still both gfortran and ifx give odd results

rm a.out ; ifx  -qopenmp -O0 ifx_simd_deadcode_lastprivate.f90 ; ./a.out
 me1            1
 IV j should be 2, is:           42

rm a.out ; gfortran -fopenmp -O0 ifx_simd_deadcode_lastprivate.f90 ; ./a.out
 me1            1
 IV j should be 2, is:   1815540460

and I hate to say it, but ifort seems to give the result I expected

rm a.out ; ifort -qopenmp -O0 ifx_simd_deadcode.f90 -diag-disable=10448 ; ./a.out 
me1            1
IV j should be 2, is:            2

I have read through SIMD in the OMP specs for 5.0 and 5.2 but am unable to find any reference to what the expected behavior is for this example.
I did note that the -qopt-report 3 for ifx showed that in the SIMD enabled case that the optimizer removed the “dead code” for the loop, hence I know why J is 42 instead of 2. It decided that my loop didn’t have any interations, hence off with it’s head. Not very Fortran-like behavior, IMHO.

The questions then are:
What is the expected value of J after the SIMD loop, with or without LASTPRIVATE
Where is this documented in the OMP Spec?

3 Likes

Just for testing, here is a link for CompilerExplorer:

Though I have no idea what’s correct, I played around with the three compilers (using the above site), which gave the results described above. An interesting thing is that ifx2021.3 gives

 me1            1
 IV j should be 2, is:            2

(same as ifort), but ifx >= 2021.4 gives j = 42. (Intuitively, I expect j = 2 because it is the same as the serial result…)

If lastprivate(j) is used, ifx2021.3 also gives this error message:

error #8592: Within a SIMD region, a DO-loop control-variable must not be specified in a PRIVATE/REDUCTION/FIRSTPRIVATE/LASTPRIVATE SIMD clause.
1 Like

I always thought that the philosophy of OpenMP was to preserve the semantics of the underlying base language, and only provide “hints” for user-directed parallelization on symmetric multiprocessors and accelerators. So the ifort behavior seems like the right thing to do.

I found the following in the 5.2 standard (page 27, first paragraph):

Each reference to a shared variable in the structured block becomes a reference to the original variable. For each private variable referenced in the structured block, a new version of the original variable (of the same type and size) is created in memory for each task or SIMD lane that contains code associated with the directive. Creation of the new version does not alter the value of the original variable. However, the impact of attempts to access the original variable from within the region corresponding to the directive is unspecified; see Section 5.4.3 for additional details. References to a private variable in the structured block refer to the private version of the original variable for the current task or SIMD lane. The relationship between the value of the original variable and the initial or final value of the private version depends on the exact clause that specifies it. (emphasis added) Details of this issue, as well as other issues with privatization, are provided in Chapter 5.

Loop iteration variables belong into the category of variables with predetermined data-sharing attributes. For C and C++, it is stated explicitly (page 97, lines 16-20):

  • The loop iteration variables in the associated loops of a simd construct with multiple associated loops are lastprivate.
  • The loop iteration variable in any associated loop of a loop construct is lastprivate.
  • The loop iteration variable in any associated loop of a loop-associated construct is otherwise private.

For Fortran on the other hand (page 97, lines 22-24):

  • Loop iteration variables inside parallel, teams, or task generating constructs are private in the innermost such construct that encloses the loop.
  • Implied-do, FORALL and DO CONCURRENT indices are private.

So nothing is said explicitly about SIMD and Fortran, and the behavior is at odds with C and C++. Should this be amended? (cc @mklemm)


The lastprivate clause is allowed here (page 98, lines 19-20):

  • The loop iteration variable in any associated loop of a loop-associated construct may be listed in a private or lastprivate clause.

The behavior of lastprivate is specified in section 5.4.5:

The lastprivate clause provides a superset of the functionality provided by the private clause. A list item that appears in a lastprivate clause is subject to the private clause semantics described in Section 5.4.3. […] or the list item is an iteration variable of one of the associated loops (emphasis added), if sequential execution of the loop nest would assign a value to the list item then the original list item is assigned the value that the list item would have after sequential execution of the loop nest.

So ifx and gfortran appear to be non-conforming here.

3 Likes

@ivanpribec thank you. I had pushback from our OMP team on this issue. I will escalate this and get a fix in ifx.

@greenrongreen, I just noticed that I missed the direction of the small language-specific arrows (page 97, lines 13-21):

Bullets 2-5 in the snippet above are for all three base programming languages, and not just C/C++ as I mistakenly claimed earlier. Bullet 3 appears to mandate that the loop iteration variable in an !$omp simd loop is lastprivate by default. (Under the assumption that a single loop is a sub-case of multiple associated loops.)

Also important, the beginning of the page 97 states:

The first matching rule from the following list of predetermined data-sharing attribute rules applies for variables and objects that are referenced in a construct.

It appears to be deliberate, that the bullet points covering simd and loop come before other loop-associated constructs. So I think that !$omp simd should indeed behave like the sequential version does (i.e. lastprivate).

! For index j, the first rule encountered is
! - the loop iteration variable in any associated loop of a loop-associated construct
!   is otherwise private

!$omp parallel do
do j = 1, nx      ! j is private (but not lastprivate)

! For index i, the first rule encountered is,
! - The loop iteration variables in the associated loops of a simd construct with multiple associated
!   loops are lastprivate

   !$omp simd
   do i = 1, ny   ! i is lastprivate
        avg(i,j) = 0.25*(f(i,j) + f(i+1,j) + f(i,j+1) + f(i+1,j+1))
   end do
   ! i == ny+1
end do
! j == ???

But if we didn’t have !$omp simd on the inner loop, then the rule which would apply would be a different one:

!$omp parallel do
do j = 1, nx      ! j is private, but not lastprivate

! For index i, the rule that applies:
! - Loop iteration variables inside `parallel`, `teams`, or task generating constructs are private
!   in the innermost such construct that encloses the loop

   do i = 1, ny   ! i is private w.r.t outer loop
        avg(i,j) = 0.25*(f(i,j) + f(i+1,j) + f(i,j+1) + f(i+1,j+1))
   end do
   ! i == ny + 1, (loop assumes base language semantics)
end do
! j == ???

Remember that in case of a combined construct, the first matching rule will apply,

! For index i, the first rule encountered is
! - The loop iteration variables in the associated loops of a simd construct with multiple associated
!   loops are lastprivate
!$omp parallel do simd
do i = 1, ny               ! i is lastprivate
   diff(i) = f(i+1) - 2*f(i) + f(i-1)
end do
! i == ny + 1

! For index i, the first rule encountered is
! - the loop iteration variable in any associated loop of a loop-associated construct 
!   is otherwise private
!$omp parallel do
do i = 1, ny               ! i is private
   diff(i) = f(i+1) - 2*f(i) + f(i-1)
end do
! i == ny + 1

In the future I will be adding lastprivate(...) for clarity in cases when I need to use the loop index in calculations after the loop, irrespective of which loop-associated construct that is. It makes sense to default to private in the standard multi-threaded case with parallel do, as lastprivate would cause an extra synchronization which may not be needed at all.

@ivanpribec thank you for the OMP Spec references. I am escalating this with our OMP team.