Clarification on DO CONCURRENT

I’ve encountered a behavior of DO CONCURRENT when using the Intel compiler that I find unexpected. There is already a long thread on the Intel board (Re:do concurrent broken with openmp - Intel Community) without a conclusion so far. I summarize here my current understanding.

The issue can be demonstrated with the help of the following example.

program test_do_concurrent

  implicit none

  print*, b([1,2])


  contains

  function b(a)

    integer, dimension(2,2) :: b
    integer, intent(in), dimension(2) :: a
    integer :: i,j

    do concurrent(i=1:2, j=1:2)
      b(i,j) = a(2) * i * j
    enddo

  end function b

end program test_do_concurrent

I would expect that it prints

2 4 4 8

and that is what I get from gfortran and ifort, unless I compile with ifort and -qopenmp, in which case I get

0 0 0 0

According to the Intel support, this is ok because my code contains unspecified behavior if a (more specifically a(2)) is not defined in the loop.

The relevant aspects of the standard are

11.1.7.5 Additional semantics for DO CONCURRENT constructs

  1. The locality of a variable that appears in a DO CONCURRENT construct is LOCAL, LOCAL_INIT, SHARED, or unspecified. A construct or statement entity of a construct or statement within the DO CONCURRENT construct has SHARED locality if it has the SAVE attribute. If it does not have the SAVE attribute, it is a different entity in each iteration, similar to LOCAL locality.
  2. A variable that has LOCAL or LOCAL_INIT locality is a construct entity with the same type, type parameters, and rank as the variable with the same name in the innermost executable construct or scoping unit that includes the DO CONCURRENT construct, and the outside variable is inaccessible by that name within the construct. The construct entity has the ASYNCHRONOUS, CONTIGUOUS, POINTER, TARGET, or VOLATILE attribute if and only if the outside variable has that attribute; it does not have the BIND, INTENT, PROTECTED, SAVE, or VALUE attribute, even if the outside variable has that attribute. If it is not a pointer, it has the same bounds as the outside variable. At the beginning of execution of each iteration,
  3. If a variable has unspecified locality,
    • if it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration; if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates;

C1128: A variable-name that appears in a LOCAL or LOCAL_INIT locality-spec shall not have the ALLOCATABLE, INTENT (IN), or OPTIONAL attribute, shall not be of finalizable type, shall not be a nonpointer polymorphic dummy argument, and shall not be a coarray or an assumed-size array. A variable-name that is not permitted to appear in a variable definition context shall not appear in a LOCAL or LOCAL_INIT locality-spec.

The reasoning seems to be a has unspecified locality, therefore it should behave similar to a variable with LOCAL locality, but this ignores that a variable with INTENT(in) cannot have LOCAL locality. I agree with the Intel support that the standard is not 100% clear here, but the behavior is in my opinion against the principle of least astonishment. 11.1.7.5.p3 also does not say that unspecified locality requires the variable to become defined in the iteration, it just says that it should become defined at most once.

1 Like

I tested gfortran -fopenmp a.f90 on this example and it still produces 2 4 4 8, even with optimizations options such as -O3 -march=native -ffast-math.

As a user, I can’t see anything wrong with your code, and as such I would personally consider this a bug in the Intel compiler. As a user, at the very least I would like the compiler to give me an error (or at least a warning) if it is going to return 0, because the code is somehow not conforming. However, as a user, I would like this code to just work — unless there is some technical reason why this cannot be done.

However, from the standards point of view, they might technically be right. In which case the standard should be improved.

I would be interested what @sblionel thinks on this one.

1 Like

Just because the description of unspecified locality bears some similarity to LOCAL locality, that doesn’t make them the same. When DO CONCURRENT was first introduced in F2008, there was no concept of locality in the standard. This got added in F2018. The current words about “unspecified locality” applied, in F2008, to all variables in an iteration. At the time, the thought was that compilers could figure out on their own whether a variable was local or shared, and compilers tried to do that but didn’t always get it right. Mirroring OpenMP, F2018 added locality clauses to allow the programmer to specify what was meant.

That INTENT(IN) dummy arguments can’t be LOCAL is irrelevant.

However… I find myself disagreeing with Intel regarding this case. The standard says “if it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration;”. That “or” is significant. a is not “previously defined” in any iteration, but neither is it “defined or become defined during any other iteration.” Therefore, the reference to a meets the requirements for unspecified locality, the example is conforming, and should produce the result you want.

I find it interesting that adding -qopenmp changes the behavior, because this source does not use any OpenMP features. Yes, ifort uses OpenMP to parallelize a DO CONCURRENT, but if I build with -qparallel, I get the “correct” answer. (With this example, -qparallel doesn’t parallelize because the compiler thinks there is insufficient work - reasonable.)

Note that the following modification does produce the desired result:

  function b(a)

    integer, dimension(2,2) :: b
    integer, intent(in), dimension(2) :: a
    integer, dimension(2) :: a2
    integer :: i,j

    a2 = a
    do concurrent(i=1:2, j=1:2)
      b(i,j) = a2(2) * i * j
    enddo

  end function b

I don’t see any functional difference between these two cases, and see no reason why simply adding -qopenmp should change the result. (In the past, one could sometimes blame that on this implicitly making all procedures recursive, but in the 2021 compiler, RECURSIVE is the default (since that’s F2018.)

5 Likes

only ifort shows this behavior (I’ve clarified the description)

1 Like