Associate private behavior in OpenMP do loops?

I recently came across an interesting code that I thought might be illuminating to improve Fortran standard specification about associate statement in OpenMP/do concurrent.

It seems that the Fortran standard is silent on whether the associate structure is private or shared in OpenMP/do concurrent do loops. I think it would make more sense for it to be private, same as the block statement.

The following code has different results in gfortran and ifort/ifx:

> cat main.f90
program main

    use omp_lib
    implicit none
    integer :: ints(4) = [1, 2, 3, 4], i

    !$omp parallel do schedule(dynamic)
    do i = 1, 4
        associate (x => ints(i))
            print *, "loc = ", loc(x), ", x = ", x, ", omp_get_thread_num() = ", omp_get_thread_num()
        end associate
    end do

end program main

> gfortran -fopenmp main.f90 && . /a
 loc = 140697567571968 , x = 1 , omp_get_thread_num() = 0
 loc = 140697567571972 , x = 2 , omp_get_thread_num() = 1
 loc = 140697567571980 , x = 4 , omp_get_thread_num() = 3
 loc = 140697567571976 , x = 3 , omp_get_thread_num() = 6

> ifort /Qopenmp /Z7 main.f90 && . /a
 loc = 140700606185484 , x = 4 , omp_get_thread_num() = 2
 loc = 140700606185484 , x = 4 , omp_get_thread_num() = 3
 loc = 140700606185484 , x = 4 , omp_get_thread_num() = 5
 loc = 140700606185484 , x = 4 , omp_get_thread_num() = 1

As you can see from the storage address of x, gfortran considers x to be private (which makes more sense to me) and ifort/ifx considers x to be publicly shared (see the intel fortran forum for the same report).

What do you all think about this, and if you find the corresponding description in the Fortran standard, please let me know, thanks in advance.

ChatGPT let me know:

In OpenMP, the scope of the entity (in this case x) in the associate block is private. This is because the associate block creates a new scope in Fortran that contains variables that can only be accessed within that associate block, not externally. Therefore, each thread will have its own associate block and its own x entity.
This design helps to ensure thread safety because each thread is manipulating its own x entity without affecting the x’s of other threads, which avoids data contention problems common in parallel computing.
As a result, you can safely use associate statements in OpenMP loops.

In OpenMP, the data-sharing attributes (for example, whether they are private or shared) of the associated name (in this case, x) in the associate statement block are determined based on the data-sharing attributes of the original entity or expression. In your example, ints(i) is an array element whose data sharing properties depend on the data sharing properties of the array ints. In your example, you don’t explicitly specify the data-sharing property of array ints, so by default, it is shared.

As an aside, I have two other examples to provoke thought:

> cat demo1.f90
program main

    use omp_lib
    implicit none
    integer :: ints(4) = [1, 2, 3, 4], i

    !$omp parallel do schedule(dynamic) private(ints)
    do i = 1, 4
        associate (x => ints(i))
            print *, "loc = ", loc(x), ", x = ", x, ", omp_get_thread_num() = ", omp_get_thread_num()
        end associate
    end do

end program main

> gfortran -fopenmp demo2.f90 && . /a
 loc = 32374240 , x = 14234872 , omp_get_thread_num() = 7
 loc = 28179940 , x = 0 , omp_get_thread_num() = 5
 loc = 6290216 , x = 14226000 , omp_get_thread_num() = 0
 loc = 26082796 , x = 32763 , omp_get_thread_num() = 4

> ifort /Qopenmp /Z7 demo2.f90 && . /a
loc = 20707964 , x = 0 , omp_get_thread_num() = 1
 loc = 24967796 , x = 0 , omp_get_thread_num() = 2
 loc = 29227632 , x = 0 , omp_get_thread_num() = 3
 loc = 1373944 , x = 1 , omp_get_thread_num() = 0

> cat demo2.f90
program main

    use omp_lib
    implicit none
    integer :: ints(4) = [1, 2, 3, 4], i

    !$omp parallel do schedule(dynamic)
    do i = 1, 4
        associate (x => get_x(ints(i)))
            print *, "loc = ", loc(x), ", x = ", x, ", omp_get_thread_num() = ", omp_get_thread_num()
        end associate
    end do

contains

    integer function get_x(y) result(x)
        integer, intent(in) :: y

       x = y + 1

    end function get_x

end program main

> gfortran -fopenmp demo2.f90 && . /a
 loc = 29490688 , x = 2 , omp_get_thread_num() = 5
 loc = 6290208 , x = 3 , omp_get_thread_num() = 0
 loc = 23199232 , x = 4 , omp_get_thread_num() = 2
 loc = 27393536 , x = 5 , omp_get_thread_num() = 4

> ifort /Qopenmp /Z7 demo2.f90 && . /a
 loc = 1375832 , x = 5 , omp_get_thread_num() = 4
 loc = 1375832 , x = 5 , omp_get_thread_num() = 3
 loc = 1375832 , x = 5 , omp_get_thread_num() = 2
 loc = 1375832 , x = 5 , omp_get_thread_num() = 0

I run your first program, on Windows using ifort Version 2021.7.1 Build 20221019_000000 and got:

 loc =  140697311514632,  x =   3,  omp_get_thread_num() =  3
 loc =  140697311514628,  x =   2,  omp_get_thread_num() =  2
 loc =  140697311514636,  x =   4,  omp_get_thread_num() =  6
 loc =  140697311514624,  x =   1,  omp_get_thread_num() =  4

Also, in your programs, the omp directive line looks like i $omp [...]. Not sure if that is a typo or something else.

1 Like

Thanks for you test, @mEm , i $omp is caused by the translator, sorry it should be !$omp.
I also made another mistake, I’ll re-comb it:
(ifort Intel(R) 64, Version 2021.8.0 Build 20221119_000000, ifx Intel(R) 64, Version 2023.0.0 Build 20221201 on Windows)

> cat main.f90
program main

    use omp_lib
    implicit none
    integer :: ints(4) = [1, 2, 3, 4], i

    !$omp parallel do schedule(dynamic)
    do i = 1, 4
        associate (x => ints(i))
            print "(3(a,i0))", "loc = ", loc(x), ", x = ", x, ", omp_get_thread_num() = ", omp_get_thread_num()
        end associate
    end do

end program main

> ifort /Qopenmp main.f90 ; ./main
loc = 140697223348224, x = 1, omp_get_thread_num() = 0
loc = 140697223348228, x = 2, omp_get_thread_num() = 3
loc = 140697223348232, x = 3, omp_get_thread_num() = 1
loc = 140697223348236, x = 4, omp_get_thread_num() = 2

> ifort /Z7 /Qopenmp main.f90 ; ./main
loc = 140695043059724, x = 4, omp_get_thread_num() = 0
loc = 140695043059724, x = 4, omp_get_thread_num() = 2
loc = 140695043059724, x = 4, omp_get_thread_num() = 4
loc = 140695043059724, x = 4, omp_get_thread_num() = 3

> ifx /Qopenmp main.f90 ; ./main
loc = 140702410522632, x = 3, omp_get_thread_num() = 1
loc = 140702410522636, x = 4, omp_get_thread_num() = 0
loc = 140702410522636, x = 4, omp_get_thread_num() = 4
loc = 140702410522636, x = 4, omp_get_thread_num() = 2

> cat demo2.f90
program main

    use omp_lib
    implicit none
    integer :: ints(4) = [1, 2, 3, 4], i

    !$omp parallel do schedule(dynamic)
    do i = 1, 4
        associate (x => get_x(ints(i)))
            print "(3(a,i0))", "loc = ", loc(x), ", x = ", x, ", omp_get_thread_num() = ", omp_get_thread_num()
        end associate
    end do

contains

    integer function get_x(y) result(x)
        integer, intent(in) :: y

        x = y + 1

    end function get_x

end program main

> ifort /Qopenmp app/main.f90 ; ./main
loc = 1375696, x = 2, omp_get_thread_num() = 1
loc = 1375696, x = 4, omp_get_thread_num() = 2
loc = 1375696, x = 5, omp_get_thread_num() = 4
loc = 1375696, x = 3, omp_get_thread_num() = 0

> ifort /Z7 /Qopenmp app/main.f90 ; ./main
loc = 1375608, x = 5, omp_get_thread_num() = 0
loc = 1375608, x = 5, omp_get_thread_num() = 3
loc = 1375608, x = 5, omp_get_thread_num() = 4
loc = 1375608, x = 5, omp_get_thread_num() = 1

> ifx /Qopenmp app/main.f90 ; ./main
loc = 1375848, x = 5, omp_get_thread_num() = 4
loc = 1375848, x = 5, omp_get_thread_num() = 0
loc = 1375848, x = 5, omp_get_thread_num() = 1
loc = 1375848, x = 5, omp_get_thread_num() = 5

It seems that the /Z7 option causes ifort to “fail”, and ifort meets expectations without /Z7;
However, ifx does not meet expectations in any case.

It looks like a bug in ifort and ifx?

I also had similar questions before, and these slides were very helpful to me.

Kelvin Li, OpenMP Users Monthly Telecon (2021)

Kelvin Li, OpenMPcon (2016)


From the above slides, my (current) understanding is that the associate name x references an array element of the shared variable ints, so x also behaves as a shared variable (which seems to be the case for Gfortran-12 + OpenMP + Ubuntu22).

EDIT: To be more explicit, I have added private/shared clauses to the OMP construct.

program main

    use omp_lib
    implicit none
    integer :: ints(4) = [1, 2, 3, 4], i

    print *, "address of original ints(4):"
    do i = 1, 4
        print *, "loc = ", loc(ints(i)), "val = ", ints(i)
    enddo
    print *
    print *, "address of x (associate-name):"

    !$omp parallel do default(none) private(i) shared(ints)
    do i = 1, 4
        associate (x => ints(i))
            print *, "loc = ", loc(x), ", x = ", x, ", thread = ", omp_get_thread_num()
        end associate
    end do

end program main
$ gfortran-12 -fopenmp test.f90
$ OMP_NUM_THREADS=4 ./a.out
 
address of original ints(4):
 loc =        93854003081232 val =        1
 loc =        93854003081236 val =        2
 loc =        93854003081240 val =        3
 loc =        93854003081244 val =        4

 address of x (associate name):
 loc =        93854003081232 , x =        1 , thread =        0
 loc =        93854003081236 , x =        2 , thread =        1
 loc =        93854003081240 , x =        3 , thread =        2
 loc =        93854003081244 , x =        4 , thread =        3

(For comparison, it might be interesting to print the address of the original ints on your machines also. )


Some related explanations from the above slides:

[1] When an ASSOCIATE construct appears inside an OMP construct:
  • The associate name (the left-hand side of =>) has the same data-sharing attribute as the selector (the right-hand side of =>) with which it is associated.

  • A privatized list item (= variables declared in the private clause) may be a selector of an ASSOCIATE or SELECT TYPE construct.

Ex1:
A construct association is established between z and private x. Any reference of z in the associate construct is to the private x.

  !$omp parallel private(x)
  associate (z => x)
    z = z + 1
  end associate
  !$omp end parallel

Ex2:
Any reference of z inside the associate construct is to (x + private y).

  !$omp parallel private(y)
  associate (z => x + y)
    t = z + 1
  end associate
  !$omp end parallel
[2] When an ASSOCIATE construct appears outside an OMP construct
  • An associate name that may appear in a variable definition context
    is shared if its association occurs outside of the OMP construct. (OMP5.2)

  • If the construct association is established prior to a parallel region,
    the association between the associate name and the original list item
    will be retained in the parallel region.

Ex1:

  x = ...   !! (L1)
  associate (foo => x)

  !$omp parallel private(x)
    x = omp_get_thread_num()   !! this x is private
    ... = foo   !! original list item (x in line L1)
  !$omp end parallel

  end associate
1 Like

Thanks for the link, @septc .

Overall, the behavior of gfortran meets my expectations for Fortran OpenMP, but the behavior of ifort/ifx does not.

program main

    use omp_lib
    implicit none
    integer :: i

    print *, "OpenMP:"
    !$omp parallel do schedule(dynamic)
    do i = 1, 4
        associate (x => iter())
            print "(3(a,i10))", "loc = ", loc(x), ", x = ", x, ", omp_get_thread_num() = ", omp_get_thread_num()
        end associate
    end do

    print *, "Serial:"
    do i = 1, 4
        associate (x => iter())
            print "(3(a,i10))", "loc = ", loc(x), ", x = ", x, ", omp_get_thread_num() = ", omp_get_thread_num()
        end associate
    end do

contains

    integer function iter()
        integer, save :: ik = 1

        iter = ik

    end function iter

end program main
> gfortran -fopenmp app/main.f90 ; ./a
 OpenMP:
loc =   30866944, x =          1, omp_get_thread_num() =          6
loc =    6289664, x =          1, omp_get_thread_num() =          0
loc =   26672640, x =          1, omp_get_thread_num() =          4
loc =   24575488, x =          1, omp_get_thread_num() =          3
 Serial:
loc =    6290468, x =          1, omp_get_thread_num() =          0
loc =    6290468, x =          1, omp_get_thread_num() =          0
loc =    6290468, x =          1, omp_get_thread_num() =          0
loc =    6290468, x =          1, omp_get_thread_num() =          0

> ifort /Qopenmp app/main.f90 ; ./main
 OpenMP:
loc =    1375632, x =          1, omp_get_thread_num() =          4
loc =    1375632, x =          1, omp_get_thread_num() =          0
loc =    1375632, x =          1, omp_get_thread_num() =          2
loc =    1375632, x =          1, omp_get_thread_num() =          3
 Serial:
loc =    1375888, x =          1, omp_get_thread_num() =          0
loc =    1375888, x =          1, omp_get_thread_num() =          0
loc =    1375888, x =          1, omp_get_thread_num() =          0
loc =    1375888, x =          1, omp_get_thread_num() =          0

Maybe I should go to the Intel Fortran forum to report this issue.

Indeed, although “ifort /Qopenmp” seems to give the same behavior with gfortran, adding “/Z7” or using ifx seems to give different results (from the above output). I’ve also tried Compiler Explorer, but I was not able to compile the code for ifort and ifx (which might be for some other reason specific to the website, though…).

Your test (as well as the original test from @zoziha ) actually shows that gfortran is considering x as a private entity (that “points” to a shared entity in this example).

I would say “as expected”. Conceptually an associate name is a local entity to the associate construct, similar to a C variable that is declared within a construct. Such a C variable is private when declared within a parallel region.

1 Like

Actually, I am still not sure how to call the data-sharing attribute of x. In my understanding, each thread does not make a copy of the original array element, so in my understanding it is “shared” (in the sense that its modification affects immediately the behavior of other threads, if they access ints, resulting in data race). On the other hand, if the compiler makes a pointer to int(i) internally, it is a “private” pointer variable (created for each thread), so the pointer itself is “private”. But… I feel it could be confusing to call x as “private” because a new copy of of int(i) is not created.

More specifically, my understanding is based on this sentence of the above slides:

The associate name (= ‘x’) has the same data-sharing attribute as the selector (= ‘int(i)’) with which it is associated

Also, my use of the word “private” is based on “privatization” (= creation of thread-local copy), e.g., as written here:

One technique to remove these conflicts is privatization. The basic principle involves making a private copy of a variable for each thread, rather than share one instance.

Variables can have either shared or private context in a parallel environment. Variables in shared context are visible to all threads running in associated parallel regions. Variables in private context are hidden from other threads. Each thread has its own private copy of the variable, and modifications made by a thread to its copy are not visible to other threads.


program main

    use omp_lib
    implicit none
    integer :: ints(4) = [1, 2, 3, 4], i

    print *, "ints(:) (before) = ", ints(:)

    !$omp parallel do default(none) private(i) shared(ints)
    do i = 1, 4
        associate (x => ints(i))
            x = i * 10
            print *, "ints(:) (inside omp) = ", ints(:)
        end associate
    end do

    print *, "ints(:) (after) = ", ints(:)

end program main
$ gfortran-12 -fopenmp test2.f90

$ OMP_NUM_THREADS=1 ./a.out
 ints(:) (before) =            1           2           3           4
 ints(:) (inside omp) =           10           2           3           4
 ints(:) (inside omp) =           10          20           3           4
 ints(:) (inside omp) =           10          20          30           4
 ints(:) (inside omp) =           10          20          30          40
 ints(:) (after) =           10          20          30          40

$ OMP_NUM_THREADS=2 ./a.out
 ints(:) (before) =            1           2           3           4
 ints(:) (inside omp) =           10           2           3           4
 ints(:) (inside omp) =           10          20          30           4
 ints(:) (inside omp) =           10          20          30          40
 ints(:) (inside omp) =           10          20          30          40
 ints(:) (after) =           10          20          30          40

I guess the talk is a bit “mixed” with the iter() version, in which x is associated with the function return value. In this case, I also think x is more like a “private” variable, because each thread needs to have an independent variable for the result. So, I guess the nature of x will depend on the selector (ints or iter() etc).

program main
    use omp_lib
    implicit none
    integer :: i

    !$omp parallel do default(none) private(i)
    do i = 1, 4
        associate (x => iter( i ))
            print *, "loc = ", loc(x), ", x = ", x, ", thread = ", omp_get_thread_num()
            !! x = 100   !! Error: cannot be used in a variable definition context
        end associate
    end do

contains

    function iter( x ) result( ret )
        integer, intent(in) :: x
        integer :: ret
        ret = x * 10
    end function

end program main
OMP_NUM_THREADS=1 ./a.out
 loc =       140731589060808 , x =       10 , thread =        0
 loc =       140731589060808 , x =       20 , thread =        0
 loc =       140731589060808 , x =       30 , thread =        0
 loc =       140731589060808 , x =       40 , thread =        0

$ OMP_NUM_THREADS=4 ./a.out
 loc =       140720363917912 , x =       10 , thread =        0
 loc =       139973565083000 , x =       20 , thread =        1
 loc =       139973556690296 , x =       30 , thread =        2
 loc =       139973548297592 , x =       40 , thread =        3

This sentence is quite confusing and can lead to wrong interpretations IMO. The conceptual model that the best fits the associate construct is to see x as a dummy argument of a virtual contained subroutine.

This code

    do i = 1, 4
        associate (x => ints(i))
            print *, loc(x), x
        end associate
    end do

is supposed to be (more of less) equivalent to this code:

    do i = 1, 4
        call subassociate(ints(i))
    end do
contains
    subroutine subassociate(x)
    integer :: x
    print *,loc(x),x
    end subroutine

I guess there may be no definite answer whether it is “private” or “shared” (in the terminology of OpenMP), because the associate name x is not allowed to be written in data-sharing clauses like private() or shared() (it becomes an error as far as I experienced). But if possible, I would also like to know more clear-cut definitions (e.g., written in the OpenMP manual)…

If we want to eliminate associate constructs from the code, I guess we probably need to declare x as a pointer, such that it directly changes the value of ints (to match the behavior of the original code with associate), something like…

program main

    use omp_lib
    implicit none
    integer, target :: ints(4) = [1, 2, 3, 4], i
    integer, pointer :: x

    print *, "address of original ints(4):"
    do i = 1, 4
        print *, "loc = ", loc(ints(i)), "val = ", ints(i)
    enddo
    print *
    print *, "loc(x):"
    !! Here we note that `loc(x)` refers to the address of the target data,
    !! rather than the address of the pointer variable.

    !$omp parallel do default(none) private(i,x) shared(ints)
    do i = 1, 4
        x => ints(i)
        print *, "loc = ", loc(x), ", x = ", x, ", thread = ", omp_get_thread_num()
        x = i * 10
    end do

    print *, "ints (final) = ", ints
end program main
 address of original ints(4):
 loc =        93936608002064 val =            1
 loc =        93936608002068 val =            2
 loc =        93936608002072 val =            3
 loc =        93936608002076 val =            4

loc(x):
 loc =        93936608002064 , x =            1 , thread =            0
 loc =        93936608002076 , x =            4 , thread =            3
 loc =        93936608002072 , x =            3 , thread =            2
 loc =        93936608002068 , x =            2 , thread =            1
 ints (final) =           10          20          30          40

Because x is written in the private() clause and given a different memory for the value of the pointer (not for the value of the target of x), I think the pointer x is regarded as “private”. However, I feel its behavior (semantic) is essentially a shared one, because it directly changes the value of ints. The same result can also be obtained by declaring a local pointer x inside the parallel region (using block):

 program main

    use omp_lib
    implicit none
    integer, target :: ints(4) = [1, 2, 3, 4], i

    print *, "address of original ints(4):"
    do i = 1, 4
        print *, "loc = ", loc(ints(i)), "val = ", ints(i)
    enddo
    print *
    print *, "loc(x):"

    !$omp parallel do default(none) private(i) shared(ints)
    do i = 1, 4
        block
          integer, pointer :: x
          x => ints(i)
          print *, "loc = ", loc(x), ", x = ", x, ", thread = ", omp_get_thread_num()
          x = i * 10
        end block
    end do

    print *, "ints (final) = ", ints
end program main

(…same result as above…)

On the other hand, if x is declared as an integer (not a pointer) inside the parallel region, it has an independent memory for the value of ints(i), which does not affect the original ints nor other threads (so a “private” variable).

program main

    use omp_lib
    implicit none
    integer, target :: ints(4) = [1, 2, 3, 4], i

    print *, "address of original ints(4):"
    do i = 1, 4
        print *, "loc = ", loc(ints(i)), "val = ", ints(i)
    enddo
    print *
    print *, "loc(x):"

    !$omp parallel do default(none) private(i) shared(ints)
    do i = 1, 4
        block
          integer :: x
          x = ints(i)
          print *, "loc = ", loc(x), ", x = ", x, ", thread = ", omp_get_thread_num()
          x = i * 10
        end block
    end do

    print *, "ints (final) = ", ints

end program main
 address of original ints(4):
 loc =        94018403278864 val =            1
 loc =        94018403278868 val =            2
 loc =        94018403278872 val =            3
 loc =        94018403278876 val =            4

loc(x):
 loc =       140730114945240 , x =            1 , thread =            0
 loc =       139729739705720 , x =            2 , thread =            1
 loc =       139729731313016 , x =            3 , thread =            2
 loc =       139729722920312 , x =            4 , thread =            3
 ints (final) =            1           2           3           4

However, this last code does not correspond to the original code with associate, because the values of ints are not modified after existing the do loop… I guess one complication is that the same x can be used for both pointers and non-pointers in Fortran, and also that associate constructs use => both for the case of pointer-like and non-pointer-like (or value-like) association.

A C variable that is declared inside an OpenMP loop can neither be specified private or shared. Yet, each thread definitely has a private copy. It is, or should be, the same for an associate name.

And a private entity can point (whatever “point” exactly means) to a shared entity, there is no contradiction here.

Today, additional discovery of associate paired with array pointers, ifort compiled programs will result in memory leaks, which may also be a bug.
Both issues, reported in the ifort Intel Fortran forum, participate in this link.

@mEm , @PierU , @septc , thank you all for your participation and discussion!

3 Likes

Behaviour of associate and block within omp blocks are handled in specification for openmp 5.1. I guess it possibly was just an oversight not to define data-sharing attributes for such entities (declared in associate or block statements) in earlier specifications. The relevant definitions can be found in “2.21.1.1 Variables Referenced in a Construct” in specification document for openmp 5.1.

See intel-fortran-forum:openmp+associate/block for more details.

Recent versions of ifort/ifx are conforming to spec 5.1 and indeed work as expected. See the two testcases in the linked thread where a omp barrier is used to make the data-sharing attribute repeatably visible.

3 Likes