Pure functions not deterministic in serial and parallel loops

certik · November 6, 2022, 4:11am

Here is a simple example that applies a pure (even elemental) function to each element of an array three different ways (elemental, do concurrent, openmp loop) and you get three different results:

program test_pure
implicit none
integer, parameter :: n = 16
integer :: i, A(n)

A = 1
A = h(A)
print *, A

A = 1
do concurrent (i = 1:n)
    A(i) = h(A(i))
end do
print *, A

A = 1
!$OMP PARALLEL DO
do i = 1,n
    A(i) = h(A(i))
end do
!$OMP END PARALLEL DO
print *, A


contains

    elemental integer function h(x)
    integer, intent(in) :: x
    h = f()*x
    end function

    pure integer function f()
    f = sum(A)
    end function

end

This prints on my computer:

$ gfortran -fopenmp test_pure.f90
$ ./a.out 
          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16
          16          31          61         121         241         481         961        1921        3841        7681       15361       30721       61441      122881      245761      491521
         241         481          61         121       92161      184321          16          31       23041       46081         961        1921        7681        7681        3841      368641
$ ./a.out
          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16
          16          31          61         121         241         481         961        1921        3841        7681       15361       30721       61441      122881      245761      491521
        7681       15361          16          31          61         121         481         961        1921        3841        7681       15361       53761         241      107521      215041
$ ./a.out
          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16
          16          31          61         121         241         481         961        1921        3841        7681       15361       30721       61441      122881      245761      491521
      245761      491521          61         121         961        1921        3841        7681       15361       30721         241         481          16          31       61441      122881

As you can see, the parallel loop even returns different results on every run!

See the thread What is a pure function? for a discussion of pure functions. A quick summary: functions can have two independent properties:

deterministic: arguments fully determine the result (no reads from global variables unless they are compile time constants)
side-effect-free: no side effects (no writes to global variables)

Fortran’s pure is side-effect-free. Fortran’s 2023 simple is both side-effect-free as well as deterministic.

It seems that currently “do concurrent” only requires side-effect-free, but not deterministic. As a consequence the above example returns non-deterministic results if run in parallel (I don’t have a compiler that can run do concurrent in parallel, so I tested with openmp above). And even in serial you get different results depending on how you apply the elemental function. In particular the Fortran standard says about elemental functions:

In the array case, the values of the elements, if any, of the result are the same as would have been obtained if the scalar function had been applied separately, in array element order, to corresponding elements of each array actual argument.

This paragraph makes perfect intuitive sense to me. My understanding of it is that the elemental application above and the first loop should produce the same answer (on my computer a regular loop and do concurrent return the same answer). But that is not the case as you can see. If you look at the program, it seems to me it even cannot be the case. If the function is not deterministic, it seems in general you can’t guarantee the property that the result are the same as would have been obtained if the scalar function had been applied /.../ to corresponding elements of each array actual argument and above is one such example, where the loop is not order independent, despite only using Fortran pure functions. It seems one requires both side-effect-free as well as deterministic in order to ensure the loop is order independent.

What is the rationale to allow non-deterministic functions inside parallel loops and in elemental procedures?

Wouldn’t it make sense to only allow deterministic (and side-effect-free) functions?

plevold · November 6, 2022, 7:46am

Well, it’s currently possible to legally execute impure code within a pure procedure so neither determinism nor freedom from side effects can currently be assumed.

Given the following qoute from Note 4 in Section 15.7 in the Fortran 2018 standard I’d say both are “bugs” in the standard:

The above constraints are designed to guarantee that a pure procedure is free from side effects (modifications
of data visible outside the procedure), which means that it is safe to reference it in constructs such as DO
CONCURRENT and FORALL, where there is no explicit order of evaluation

plevold · November 6, 2022, 8:15am

I just thought of one thing in the standards “favour” regarding your example: Since the program statement isn’t pure there’s no claim that the do concurrent loop actually will be pure. Immediately calling a pure subroutine main() and putting the do loop inside it would protect against the behaviour observed here (I think?).

lkedward · November 6, 2022, 9:32am

It seems to me that even if you wrote f and h as simple procedures (by taking A as an argument) you would still get the same behaviour because you have a race condition on reading and writing to elements of A (hence the non-deterministic parallel execution) .

Therefore even though the function may be deterministic, the loop body is not because it writes to A.

I’m not sure what the Standard says about this, but it might be that the user is expected to ensure that the loop body is appropriate for concurrent or parallel execution?

There is no race condition when applying h elementally because the assignment occurs as if the RHS is evaluated entirely beforehand.

band-a-prend · November 6, 2022, 10:06am

Actually the pure function f() does that it should even for parallel run: on every iteration it takes “current” state of global variable array A and evaluate the sum(A). I.e. the pure function doesn’t know anything about external parallel loop. Should it be able to use global variables? I assume that does otherwise it will not be possible to use e.g. physical constants (CODATA as example those constants are specified from time to time) that user could specify in external data file.

Does the Fortran 200x standard will specify simple procedure i.e. exact pure procedure that are not able to reference any outside variables?

rwmsu · November 6, 2022, 2:03pm

ifort 2021.7 and ifx 2022.2 give the following. Like gfortran the elemental case is consistent but has different results. do concurrent and openmp are both non-deterministic. Question. who is right for the elemental case? gfortran or ifort

Edit. Looking more closely at the output, it appears do concurrent is computing the same values as elemental but they get stored in a different order in A.


  Elemental
          16          31          61         121         241         481
         961        1921        3841        7681       15361       30721
       61441      122881      245761      491521
 
  Do Concurrent
          16          31        7681       15361         961        1921
         121         241       61441      491521      245761      122881
         481       30721          61        3841
 
  openMP
          16          31         121         241          61         121
         121         241        2881        6961        1441         121
         121         241       13921        1201

 ./tpure.x
 
  Elemental
          16          31          61         121         241         481
         961        1921        3841        7681       15361       30721
       61441      122881      245761      491521
 
  Do Concurrent
          16          31        7681       15361        1921        3841
         481         961       30721      491521      245761      122881
          61       61441         121         241
 
  openMP
          16          31          61         181          61         181
        1381        2761         241        1381         241         241
          61          61        1381        1381

Edit 2.

Even stranger, NVIDIA Fortran (nvfortran) give identical results for elemental and do concurrent. This highlites one of my main gripes about the current state of Fortran compilers. You really don’t know which ones you can trust. nvfortran results

Edit 3. Looks like nvfortran defaults to sequential execution of do concurrent unless you specify the -stdpar=multicore or -stdpar=gpu compiler option. with -stdpar=multicore the do concurrent results become non-deterministic and dont match the elemental results.

 
  Elemental
           16           31           61          121          241          481 
          961         1921         3841         7681        15361        30721 
        61441       122881       245761       491521
 
  Do Concurrent
           16           31           61          121          241          481 
          961         1921         3841         7681        15361        30721 
        61441       122881       245761       491521
 
  openMP
           16           31          496          991           61          166 
           46          166        15841        31681        31681        95041 
         1981       190081         3961         7921

./tpuren.x
 
  Elemental
           16           31           61          121          241          481 
          961         1921         3841         7681        15361        30721 
        61441       122881       245761       491521
 
  Do Concurrent
           16           31           61          121          241          481 
          961         1921         3841         7681        15361        30721 
        61441       122881       245761       491521
 
  openMP
          121          361        51841       103681           16           31 
         1441         4321        25921         8641         8641          721 
         1441       207361           61          121

certik · November 6, 2022, 2:19pm

@lkedward it’s subtle: there is no race condition in the loop if you only evaluate the loop, and assume the function h is “pure” (which it seems to me you have to assume any function you call is both deterministic and side-effect-free, otherwise in general the loop is not order independent). You are right that the function h accessed (read only!) a global variable, and that causes the race condition, thus proving that you need more than side-effect-free.

@band-a-prend yes, in Fortran, “pure” is only side-effect-free, but does not have to be deterministic, thus you are allowed to access global variables (constant or not), but you can only read from them, not write. This then causes the above trouble. Regarding reading constant global variables, I think that is fine in all cases (including if the function is deterministic), since they can’t cause any of the above trouble.

plevold · November 6, 2022, 4:48pm

There seems to be a problem with contain-ed procedures and purity. The program containing the do loops are impure, or specifically the assignments A(i) = ... are impure and has side effects. In other words, the impurity happens between each invocation of the elemental function h.

As program statements cannot be declared pure, this is easier seen if we refactor the do loops to subroutine. Here’s two alternative implementations:

Alternative 1:

module test_pure
implicit none

    integer, parameter :: n = 16
    integer :: A(n)

contains

    subroutine main()
        integer :: i
        
        A = 1
        A = h(A) 
        !$OMP PARALLEL DO
        do i = 1,size(A)
            A(i) = h(A(i))
        end do
        !$OMP END PARALLEL DO

    end subroutine
    
    elemental integer function h(x)
    integer, intent(in) :: x
    h = f()*x
    end function

    pure integer function f()
    f = sum(A)
    end function
    
end module

program test_pure_prog
    use test_pure, only: main, A
    
    call main()
    print *, A
end program

If we try to declare main as pure the compiler rightfully complains:

app/main.f90:12:8:

   12 |         A = 1
      |        1
Error: Variable 'a' cannot appear in a variable definition context (assignment) at (1) in PURE procedure

However by using contains and passing A as an argument it is indeed possible to trigger this issue:

module test_pure
implicit none

contains

    pure subroutine main(A)
        integer, intent(inout) :: A(:)
        integer :: i
        
        A = 1
        A = h(A) 
        !$OMP PARALLEL DO
        do i = 1,size(A)
            A(i) = h(A(i))
        end do
        !$OMP END PARALLEL DO

    contains
    
        elemental integer function h(x)
        integer, intent(in) :: x
        h = f()*x
        end function
    
        pure integer function f()
        f = sum(A)
        end function
    end subroutine
end module

program test_pure_prog
    use test_pure, only: main
    
    integer, parameter :: n = 16
    integer :: A(n)
    
            
    print *, A
    call main(A)
    print *, A
end program

I suppose main is still free of side effects seen from the caller, but it is not deterministic when executed in parallell. I haven’t checked if the Fortran standard specifically mentions determinism, but as it mentions do concurrent as a motivation to introduce purity I would still call it a standard bug.

certik · November 6, 2022, 9:01pm

@plevold great example. Let’s analyze it.

I think the issue of main being “pure” is not relevant. In your second example, main is both side-effect-free and deterministic (well, it’s not deterministic in practice because of the “bug” that we are trying to identify and exclude using some rules; but it does not read from any global variables).

I think it’s the function h that must be both side-effect-free and deterministic. It is side-effect-free, but not deterministic: it depends on “A”, which would not be a problem if we never wrote to “A”. But the whole point of parallel loops is to write to some arrays (“A” in this case), so we have to modify some arrays. The whole point of a parallel loop is to produce a side effect.

But all that we are doing is just A(i) = h(A(i)). That should be allowed, but h must be both side-effect-free and deterministic.

So it seems to me one way to fix all of the above issues is to require in do concurrent (and in openmp loops) for all functions to be both side-effect-free as well as deterministic, which is the upcoming simple attribute in Fortran. So the compiler would say that A(i) = h(A(i)) is not allowed in do concurrent, because h is not “simple” (only “pure”, which is not enough).

I think also elemental should require the function to be simple, which would forbid the case above and both ifort and gfortran should then return exactly the same numbers.

FortranFan · November 6, 2022, 10:39pm

In the context of Fortran standard, it may be rather difficult to address as you indicate with DO CONCURRENT, the development of this construct having focused more on instructions followed by the processor in any order rather than a definitively concurrent/parallel execution mode, The latter is what is clearly now of great interest given where computing is headed.

Competent effort by the community focused strongly on concurrent/parallel execution using, say, the DO PARALLEL suggested by @rouson may be worthwhile where all the rules can be put together such as those being discussed here can be considered.

Also, in the context of Fortran standard, ELEMENTAL has been around since the Fortran 95 standard whereas SIMPLE makes its way only in Fortran 2023. Thus codes have come about where ELEMENTAL are not necessarily SIMPLE making the change not possible now without backward compatibility issues, right?

certik · November 7, 2022, 12:19am

Yes, there will be backwards compatibility issues. I think just like with implicit none, the compilers can simply start enforcing the more stricter rules, and provide a way out for older codes. The details get tricky so that the user experience is smooth, but I am not worried about that right now. In this thread I am just trying to understand exactly what the issues are.

Are you aware of some summary of the issues with pure and why the standard now introduces simple? Is it precisely due to the issues I raised above?

mecej4 · November 7, 2022, 1:48am

@certik, I am puzzled why it seems to me that the function f(), though declared to be pure, is not pure, because the value it returns is computed from a variable in the containing scope, and that outer scope variable is not constant. Thus, the variable A behaves as a global variable.

It is a separate matter as to whether a compiler detects that the pure attribute declaration is incorrect.

certik · November 7, 2022, 3:15am

@mecej4 you can follow the thread I linked: What is a pure function?, in particular, go here: Pure function - Wikipedia. The “pure” attribute in Fortran means no side-effects (roughly speaking no writing to global variables), but you can still read from global variables, which is what the function f() does. The reason it surprises you is because you probably expect “pure” to mean both side-effect-free and deterministic (no reading from global variables). That’s why I did the survey, and as of now, 74% of people who voted also expect that (myself included). The meaning of “pure” is subjective, so it’s better to just talk about the two properties (side-effect-free and deterministic) and just keep in mind that the Fortran “pure” keyword just means side-effect-free, but does not enforce deterministic.

mecej4 · November 7, 2022, 4:08am

Thanks for your explanation.

What I find unsettling is that, based on your explanation, a function that calls a pseudo-RNG function that reads the system clock to initialize itself can still be “pure”. So, a “pure” function cannot pollute its surroundings, but is perfectly amenable to getting polluted by its environment if that environment is deterministic, e.g., bound by the laws of the deterministic parts of physics.

plevold · November 7, 2022, 6:06am

@certik I think we also need to check what the standard says about aliasing of parent procedure variables. Somewhere in the standard it is stated that it is the programmer’s responsibility to make sure that dummy arguments are not aliasing the same memory. I don’t know if it has anything to say in this situation as I couldn’t find the exact paragraph on my phone (the standard document isn’t exactly “mobile friendly”).

mecej4 · November 7, 2022, 11:49am

The mere existence of aliased dummy arguments, when two or more formal arguments to a subprogram have portions that occupy the same memory locations, need not be a problem as far as correct operation of the code is concerned. (Performance may be affected because of cache issues, but the results should be correct.)

Problems may occur when one aliased variable is assigned new values, and its partner in crime is used later in an expression.

For a nice example of how confusing it can get when aliasing exists, see Robert Corbett’s post in C.L.F. a decade ago.

everythingfunctional · November 7, 2022, 4:22pm

I believe the example is non-conforming for the do concurrent case. The following text from the standard:

If a variable has unspecified locality,

if it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration; if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates;

A is referenced during an iteration (albeit indirectly), but it is (partially) defined in other iterations. This is normative text, and so the processor is not obligated to detect and report the non-conformance, but I believe that is the case here.

For the elemental case, I believe that what the standard says about the RHS of an assignment statement being fully evaluated before assignment to the LHS holds here, and the result is deterministic and as expected.

For the openmp directive, I would suggest it’s not standards conforming, as the standard says nothing about openmp directives or their expected behaviour. I believe the example would be standards conforming without the openmp directive, and would be deterministic, albeit with somewhat surprising (or at least not immediately obvious) behaviour.

certik · November 7, 2022, 7:46pm

@everythingfunctional I see, perfect, thank you. So effectively I think the standard says the function h should not access A outside of the current iteration, but the current iteration is already available via an explicit argument, so from this it follows that the standard effectively says the function h should be “simple” (side-effect-free and deterministic), because it requires the function h to be “pure”, plus these extra requirements on what global variables it can depend on, which effectively makes it “simple”. Would you agree?

everythingfunctional · November 7, 2022, 8:07pm

Yes, all operations that occur within the body of a do concurrent loop must be simple, at least as far as the loop is concerned. They may refer to data beyond their arguments, so long as no variables it uses are assigned to within the body of the loop.

certik · November 7, 2022, 9:12pm

Exactly. And in this case, I think those extra variables can (in theory) be passed explicitly as arguments and the function becomes “simple”. So the “right” requirement indeed seems to be “simple” functions or equivalent.

Topic		Replies	Views
Pure subroutine error with do concurrent for external routine Help	4	447	April 19, 2023
Speed of array intrinsics	22	1916	April 14, 2025
Elemental character function	6	1109	February 2, 2022
Trying to make a program using OpenMP give same output as non parallel counterpart Help	2	483	May 9, 2022
Applications of reduce intrinsic?	1	560	January 20, 2022

Pure functions not deterministic in serial and parallel loops

Related topics