# Pure functions not deterministic in serial and parallel loops

Here is a simple example that applies a pure (even elemental) function to each element of an array three different ways (elemental, do concurrent, openmp loop) and you get three different results:

``````program test_pure
implicit none
integer, parameter :: n = 16
integer :: i, A(n)

A = 1
A = h(A)
print *, A

A = 1
do concurrent (i = 1:n)
A(i) = h(A(i))
end do
print *, A

A = 1
!\$OMP PARALLEL DO
do i = 1,n
A(i) = h(A(i))
end do
!\$OMP END PARALLEL DO
print *, A

contains

elemental integer function h(x)
integer, intent(in) :: x
h = f()*x
end function

pure integer function f()
f = sum(A)
end function

end
``````

This prints on my computer:

``````\$ gfortran -fopenmp test_pure.f90
\$ ./a.out
16          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16
16          31          61         121         241         481         961        1921        3841        7681       15361       30721       61441      122881      245761      491521
241         481          61         121       92161      184321          16          31       23041       46081         961        1921        7681        7681        3841      368641
\$ ./a.out
16          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16
16          31          61         121         241         481         961        1921        3841        7681       15361       30721       61441      122881      245761      491521
7681       15361          16          31          61         121         481         961        1921        3841        7681       15361       53761         241      107521      215041
\$ ./a.out
16          16          16          16          16          16          16          16          16          16          16          16          16          16          16          16
16          31          61         121         241         481         961        1921        3841        7681       15361       30721       61441      122881      245761      491521
245761      491521          61         121         961        1921        3841        7681       15361       30721         241         481          16          31       61441      122881
``````

As you can see, the parallel loop even returns different results on every run!

See the thread What is a pure function? for a discussion of pure functions. A quick summary: functions can have two independent properties:

• deterministic: arguments fully determine the result (no reads from global variables unless they are compile time constants)
• side-effect-free: no side effects (no writes to global variables)

Fortranâ€™s `pure` is side-effect-free. Fortranâ€™s 2023 `simple` is both side-effect-free as well as deterministic.

It seems that currently â€śdo concurrentâ€ť only requires side-effect-free, but not deterministic. As a consequence the above example returns non-deterministic results if run in parallel (I donâ€™t have a compiler that can run `do concurrent` in parallel, so I tested with openmp above). And even in serial you get different results depending on how you apply the elemental function. In particular the Fortran standard says about elemental functions:

In the array case, the values of the elements, if any, of the result are the same as would have been obtained if the scalar function had been applied separately, in array element order, to corresponding elements of each array actual argument.

This paragraph makes perfect intuitive sense to me. My understanding of it is that the elemental application above and the first loop should produce the same answer (on my computer a regular loop and do concurrent return the same answer). But that is not the case as you can see. If you look at the program, it seems to me it even cannot be the case. If the function is not deterministic, it seems in general you canâ€™t guarantee the property that `the result are the same as would have been obtained if the scalar function had been applied /.../ to corresponding elements of each array actual argument` and above is one such example, where the loop is not order independent, despite only using Fortran `pure` functions. It seems one requires both side-effect-free as well as deterministic in order to ensure the loop is order independent.

What is the rationale to allow non-deterministic functions inside parallel loops and in elemental procedures?

Wouldnâ€™t it make sense to only allow deterministic (and side-effect-free) functions?

3 Likes

Well, itâ€™s currently possible to legally execute impure code within a pure procedure so neither determinism nor freedom from side effects can currently be assumed.

Given the following qoute from Note 4 in Section 15.7 in the Fortran 2018 standard Iâ€™d say both are â€śbugsâ€ť in the standard:

The above constraints are designed to guarantee that a pure procedure is free from side effects (modifications
of data visible outside the procedure), which means that it is safe to reference it in constructs such as DO
CONCURRENT and FORALL, where there is no explicit order of evaluation

1 Like

I just thought of one thing in the standards â€śfavourâ€ť regarding your example: Since the program statement isnâ€™t pure thereâ€™s no claim that the do concurrent loop actually will be pure. Immediately calling a `pure subroutine main() ` and putting the do loop inside it would protect against the behaviour observed here (I think?).

It seems to me that even if you wrote `f` and `h` as `simple` procedures (by taking `A` as an argument) you would still get the same behaviour because you have a race condition on reading and writing to elements of `A` (hence the non-deterministic parallel execution) .

Therefore even though the function may be deterministic, the loop body is not because it writes to `A`.

Iâ€™m not sure what the Standard says about this, but it might be that the user is expected to ensure that the loop body is appropriate for concurrent or parallel execution?

There is no race condition when applying `h` elementally because the assignment occurs as if the RHS is evaluated entirely beforehand.

Actually the `pure function f()` does that it should even for parallel run: on every iteration it takes â€ścurrentâ€ť state of global variable array `A` and evaluate the `sum(A)`. I.e. the pure function doesnâ€™t know anything about external parallel loop. Should it be able to use global variables? I assume that does otherwise it will not be possible to use e.g. physical constants (CODATA as example those `constants` are specified from time to time) that user could specify in external data file.

Does the Fortran 200x standard will specify `simple` procedure i.e. `exact pure` procedure that are not able to reference any outside variables?

ifort 2021.7 and ifx 2022.2 give the following. Like gfortran the elemental case is consistent but has different results. do concurrent and openmp are both non-deterministic. Question. who is right for the elemental case? gfortran or ifort

Edit. Looking more closely at the output, it appears do concurrent is computing the same values as elemental but they get stored in a different order in A.

``````
Elemental
16          31          61         121         241         481
961        1921        3841        7681       15361       30721
61441      122881      245761      491521

Do Concurrent
16          31        7681       15361         961        1921
121         241       61441      491521      245761      122881
481       30721          61        3841

openMP
16          31         121         241          61         121
121         241        2881        6961        1441         121
121         241       13921        1201

./tpure.x

Elemental
16          31          61         121         241         481
961        1921        3841        7681       15361       30721
61441      122881      245761      491521

Do Concurrent
16          31        7681       15361        1921        3841
481         961       30721      491521      245761      122881
61       61441         121         241

openMP
16          31          61         181          61         181
1381        2761         241        1381         241         241
61          61        1381        1381
``````

Edit 2.

Even stranger, NVIDIA Fortran (nvfortran) give identical results for elemental and do concurrent. This highlites one of my main gripes about the current state of Fortran compilers. You really donâ€™t know which ones you can trust. nvfortran results

Edit 3. Looks like nvfortran defaults to sequential execution of do concurrent unless you specify the -stdpar=multicore or -stdpar=gpu compiler option. with -stdpar=multicore the do concurrent results become non-deterministic and dont match the elemental results.

``````
Elemental
16           31           61          121          241          481
961         1921         3841         7681        15361        30721
61441       122881       245761       491521

Do Concurrent
16           31           61          121          241          481
961         1921         3841         7681        15361        30721
61441       122881       245761       491521

openMP
16           31          496          991           61          166
46          166        15841        31681        31681        95041
1981       190081         3961         7921

./tpuren.x

Elemental
16           31           61          121          241          481
961         1921         3841         7681        15361        30721
61441       122881       245761       491521

Do Concurrent
16           31           61          121          241          481
961         1921         3841         7681        15361        30721
61441       122881       245761       491521

openMP
121          361        51841       103681           16           31
1441         4321        25921         8641         8641          721
1441       207361           61          121
``````
1 Like

@lkedward itâ€™s subtle: there is no race condition in the loop if you only evaluate the loop, and assume the function `h` is â€śpureâ€ť (which it seems to me you have to assume any function you call is both deterministic and side-effect-free, otherwise in general the loop is not order independent). You are right that the function `h` accessed (read only!) a global variable, and that causes the race condition, thus proving that you need more than side-effect-free.

@band-a-prend yes, in Fortran, â€śpureâ€ť is only side-effect-free, but does not have to be deterministic, thus you are allowed to access global variables (constant or not), but you can only read from them, not write. This then causes the above trouble. Regarding reading constant global variables, I think that is fine in all cases (including if the function is deterministic), since they canâ€™t cause any of the above trouble.

1 Like

There seems to be a problem with `contain`-ed procedures and purity. The program containing the do loops are impure, or specifically the assignments `A(i) = ...` are impure and has side effects. In other words, the impurity happens between each invocation of the elemental function `h`.

As program statements cannot be declared pure, this is easier seen if we refactor the do loops to subroutine. Hereâ€™s two alternative implementations:

Alternative 1:

``````module test_pure
implicit none

integer, parameter :: n = 16
integer :: A(n)

contains

subroutine main()
integer :: i

A = 1
A = h(A)
!\$OMP PARALLEL DO
do i = 1,size(A)
A(i) = h(A(i))
end do
!\$OMP END PARALLEL DO

end subroutine

elemental integer function h(x)
integer, intent(in) :: x
h = f()*x
end function

pure integer function f()
f = sum(A)
end function

end module

program test_pure_prog
use test_pure, only: main, A

call main()
print *, A
end program
``````

If we try to declare `main` as pure the compiler rightfully complains:

``````app/main.f90:12:8:

12 |         A = 1
|        1
Error: Variable 'a' cannot appear in a variable definition context (assignment) at (1) in PURE procedure
``````

However by using `contains` and passing `A` as an argument it is indeed possible to trigger this issue:

``````module test_pure
implicit none

contains

pure subroutine main(A)
integer, intent(inout) :: A(:)
integer :: i

A = 1
A = h(A)
!\$OMP PARALLEL DO
do i = 1,size(A)
A(i) = h(A(i))
end do
!\$OMP END PARALLEL DO

contains

elemental integer function h(x)
integer, intent(in) :: x
h = f()*x
end function

pure integer function f()
f = sum(A)
end function
end subroutine
end module

program test_pure_prog
use test_pure, only: main

integer, parameter :: n = 16
integer :: A(n)

print *, A
call main(A)
print *, A
end program
``````

I suppose `main` is still free of side effects seen from the caller, but it is not deterministic when executed in parallell. I havenâ€™t checked if the Fortran standard specifically mentions determinism, but as it mentions `do concurrent` as a motivation to introduce purity I would still call it a standard bug.

2 Likes

@plevold great example. Letâ€™s analyze it.

I think the issue of `main` being â€śpureâ€ť is not relevant. In your second example, main is both side-effect-free and deterministic (well, itâ€™s not deterministic in practice because of the â€śbugâ€ť that we are trying to identify and exclude using some rules; but it does not read from any global variables).

I think itâ€™s the function `h` that must be both side-effect-free and deterministic. It is side-effect-free, but not deterministic: it depends on â€śAâ€ť, which would not be a problem if we never wrote to â€śAâ€ť. But the whole point of parallel loops is to write to some arrays (â€śAâ€ť in this case), so we have to modify some arrays. The whole point of a parallel loop is to produce a side effect.

But all that we are doing is just `A(i) = h(A(i))`. That should be allowed, but `h` must be both side-effect-free and deterministic.

So it seems to me one way to fix all of the above issues is to require in `do concurrent` (and in openmp loops) for all functions to be both side-effect-free as well as deterministic, which is the upcoming `simple` attribute in Fortran. So the compiler would say that `A(i) = h(A(i))` is not allowed in `do concurrent`, because `h` is not â€śsimpleâ€ť (only â€śpureâ€ť, which is not enough).

I think also `elemental` should require the function to be `simple`, which would forbid the case above and both ifort and gfortran should then return exactly the same numbers.

1 Like

In the context of Fortran standard, it may be rather difficult to address as you indicate with `DO CONCURRENT`, the development of this construct having focused more on instructions followed by the processor in any order rather than a definitively concurrent/parallel execution mode, The latter is what is clearly now of great interest given where computing is headed.

Competent effort by the community focused strongly on concurrent/parallel execution using, say, the `DO PARALLEL` suggested by @rouson may be worthwhile where all the rules can be put together such as those being discussed here can be considered.

Also, in the context of Fortran standard, `ELEMENTAL` has been around since the Fortran 95 standard whereas `SIMPLE` makes its way only in Fortran 2023. Thus codes have come about where `ELEMENTAL` are not necessarily `SIMPLE` making the change not possible now without backward compatibility issues, right?

2 Likes

Yes, there will be backwards compatibility issues. I think just like with implicit none, the compilers can simply start enforcing the more stricter rules, and provide a way out for older codes. The details get tricky so that the user experience is smooth, but I am not worried about that right now. In this thread I am just trying to understand exactly what the issues are.

Are you aware of some summary of the issues with `pure` and why the standard now introduces `simple`? Is it precisely due to the issues I raised above?

2 Likes

@certik, I am puzzled why it seems to me that the function `f()`, though declared to be pure, is not pure, because the value it returns is computed from a variable in the containing scope, and that outer scope variable is not constant. Thus, the variable A behaves as a global variable.

It is a separate matter as to whether a compiler detects that the `pure` attribute declaration is incorrect.

@mecej4 you can follow the thread I linked: What is a pure function?, in particular, go here: Pure function - Wikipedia. The â€śpureâ€ť attribute in Fortran means no side-effects (roughly speaking no writing to global variables), but you can still read from global variables, which is what the function `f()` does. The reason it surprises you is because you probably expect â€śpureâ€ť to mean both side-effect-free and deterministic (no reading from global variables). Thatâ€™s why I did the survey, and as of now, 74% of people who voted also expect that (myself included). The meaning of â€śpureâ€ť is subjective, so itâ€™s better to just talk about the two properties (side-effect-free and deterministic) and just keep in mind that the Fortran â€śpureâ€ť keyword just means side-effect-free, but does not enforce deterministic.

2 Likes

What I find unsettling is that, based on your explanation, a function that calls a pseudo-RNG function that reads the system clock to initialize itself can still be â€śpureâ€ť. So, a â€śpureâ€ť function cannot pollute its surroundings, but is perfectly amenable to getting polluted by its environment if that environment is deterministic, e.g., bound by the laws of the deterministic parts of physics.

1 Like

@certik I think we also need to check what the standard says about aliasing of parent procedure variables. Somewhere in the standard it is stated that it is the programmerâ€™s responsibility to make sure that dummy arguments are not aliasing the same memory. I donâ€™t know if it has anything to say in this situation as I couldnâ€™t find the exact paragraph on my phone (the standard document isnâ€™t exactly â€śmobile friendlyâ€ť).

2 Likes

The mere existence of aliased dummy arguments, when two or more formal arguments to a subprogram have portions that occupy the same memory locations, need not be a problem as far as correct operation of the code is concerned. (Performance may be affected because of cache issues, but the results should be correct.)

Problems may occur when one aliased variable is assigned new values, and its partner in crime is used later in an expression.

For a nice example of how confusing it can get when aliasing exists, see Robert Corbettâ€™s post in C.L.F. a decade ago.

1 Like

I believe the example is non-conforming for the `do concurrent` case. The following text from the standard:

If a variable has unspecified locality,

• if it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration; if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates;

`A` is referenced during an iteration (albeit indirectly), but it is (partially) defined in other iterations. This is normative text, and so the processor is not obligated to detect and report the non-conformance, but I believe that is the case here.

For the `elemental` case, I believe that what the standard says about the RHS of an assignment statement being fully evaluated before assignment to the LHS holds here, and the result is deterministic and as expected.

For the openmp directive, I would suggest itâ€™s not standards conforming, as the standard says nothing about openmp directives or their expected behaviour. I believe the example would be standards conforming without the openmp directive, and would be deterministic, albeit with somewhat surprising (or at least not immediately obvious) behaviour.

2 Likes

@everythingfunctional I see, perfect, thank you. So effectively I think the standard says the function `h` should not access `A` outside of the current iteration, but the current iteration is already available via an explicit argument, so from this it follows that the standard effectively says the function `h` should be â€śsimpleâ€ť (side-effect-free and deterministic), because it requires the function `h` to be â€śpureâ€ť, plus these extra requirements on what global variables it can depend on, which effectively makes it â€śsimpleâ€ť. Would you agree?

Yes, all operations that occur within the body of a `do concurrent` loop must be `simple`, at least as far as the loop is concerned. They may refer to data beyond their arguments, so long as no variables it uses are assigned to within the body of the loop.

2 Likes

Exactly. And in this case, I think those extra variables can (in theory) be passed explicitly as arguments and the function becomes â€śsimpleâ€ť. So the â€śrightâ€ť requirement indeed seems to be â€śsimpleâ€ť functions or equivalent.

2 Likes