Quiz : So, you think you know Fortran expressions

A = Z(X) + W(X) ! all identifiers are non-pointer, non-allocatable, scalar, default real
B = Z(X)       ! all functions are EXTERNAL, all interfaces are implicit
A = B + W(X)   ! B is never mentioned anywhere else and is not associated with anything 

Which one might execute faster and why?

  • The first variant
  • The second variant
0 voters

The first.
Fortran rules for function references in an expression within a statement state the evaluation of a function reference shall neither affect nor be affected by the evaluation of any other entity within the statement. This allows the compiler to assume that neither Z nor W can change the value of not only X but anything that would affect the function results. It can then, safely, schedule Z and W on separate execution resources and perform the calculation concurrently. No such licence is available for the second form because the compiler cannot be certain that Z does not modify X (or anything else that might affect computation of W), so it has to schedule Z and W execution sequentially.
This is sometimes interpreted as Fortran functions are pure functions and you can see why this is too strong a statement to be true (after all, Fortran has PURE/IMPURE keywords). But in the limited context of functions references in a statement, it kind of is. The compiler never saw any PURE or SIMPLE attribute and yet was able to make the inference. Fortran subroutines are not just “functions without a return value”.
For the purposes of the rule just described, the IF/WHERE/FORALL statements have special handling (they are treated as the equivalent construct).

A separate issue: Will statements in Z and W be executed?

  • Yes
  • Maybe
0 voters

Maybe.
There’s a wonderful sentence in Fortran semantics: It is not necessary for a processor to evaluate all of the operands of an expression, or to evaluate entirely each operand, if the value of the expression can be determined otherwise.
“Otherwise” includes magic, undiscovered tech or maths, divine inspiration, mind-reading, time travel. All of those things could improve a compiler. If the compiler detects that you are trying to calculate and print the gazillionth hexadecimal digit of pi, it could just look it up, or use a better calculation than the one you provided.

Please correct me if I have misunderstood anything.

6 Likes

You understand correctly.

4 Likes

For bonus points: Look at the two variants again. You are told that the compiler will determine the result of Z function reference “otherwise”. Which variant did it see?

  • The first variant
  • The second variant
  • Either
0 voters

The first variant. A Fortran semantic rule says If a statement contains a function reference in a part of an expression that need not be evaluated, all entities that would have become defined in the execution of that reference become undefined at the completion of evaluation of the expression containing the function reference. So if Z(X) is not evaluated, and it redefined its dummy, and we are looking at the second variant, X would be undefined after the first statement which would make the second statement non-conforming. But the second variant is perfectly conforming even when Z redefines its dummy, so it must not have been the one seen by the compiler. The first variant not only allows concurrent evaluation but also no evaluation of the + operands at all.

1 Like

This specifically says “expression”, so there is still the question whether this principle extends further to entire statements and to series of statements. For example, optimizing compilers sometimes factor out entire do loops when they can determine that nothing evaluated within that do loop is used afterwards, or the entire loop is reduced to just a single pass when it determines that the modified variables are only determined by the last cycle. Is that optimization allowed by the standard, or does that venture into “language extension” territory? So in the above examples, if the compiler can determine that the variables A and B are never referenced afterwards, is it allowed to eliminate entirely the Z() and W() function references, regardless of any side effects?

This last one is unclear to me. Suppose the compiler knows the values of Z(X) and W(X) “otherwise”. For example, it knows that Z(X) evaluates to 20 and it knows that W(X) evaluates to 22, it would still need to add those two values together to get the correct result of 42. Or in more general situations, if “+” is overridden with some user function, that user function would need to be invoked with the arguments 20 and 22.

1 Like

It would still need to perform the addition at runtime, unless both operands can be determined “otherwise” (or one is known to be zero), in which case it is a compile-time calculation.

1 Like

No, unless they are PURE, in which case, Yes.

In the wikipedia entry for “pure function”, one reads about Property One (return value dependence only on arguments) and Property Two (no side effects). Fortran PURE corresponds to Property Two (no side effects). Fortran SIMPLE corresponds to both Properties (SIMPLE is always PURE but not vice-versa).

I wanted to stay with expressions for this thread.

I’m trying to think of some extreme cases where the first statement might execute slower than the second pair of statements. The first statement allows Z(X) and W(X) to be evaluated simultaneously or in either order sequentially. Suppose they are evaluated simultaneously, but they both require access to the same functional unit, and that requires an underlying serialization with handshaking or with some other communication between the two threads/processes. That handshaking overhead would increase the execution costs compared to the evaluation in two separate statements where no handshaking overhead is required. So in that case, the second pair of statements might evaluate faster than the first single statement. This would be a not uncommon situation where execution in parallel is slower than serial execution due to additional overheads involved (handshaking, synchronization, etc.).

Another possible situation is when Z(X) and W(X) fiddle around with IEEE settings. Say that W(X) turns on gradual underflow (which is expensive), and that W(X) is evaluated first in the first statement. In that case, Z(X) would then be evaluated also with gradual underflow turned on, which might slow it down. In the second pair of statements, Z(X) might be evaluated first without gradual underflow, and then the second statement would involve only W(X) being evaluated with gradual underflow. In this case, the second pair of statements might be evaluated faster than the first single statement.

I think, but I’m not certain, that both of those situations are allowed by the fortran semantics.

Indeed, most of us may have seen this when we “oversubscribe” and try to launch 8 MPI ranks on one core. Memory hierarchy cannot keep up and you might end up running at slowest-tier speeds (are hard-disks still a thing?). But that is just a misfire by the optimizer, governed by pragmatics and not language issues.

It would not, if the compiler is conforming.
In a procedure other than IEEE_SET_UNDERFLOW_MODE or IEEE_SET_STATUS, the processor shall not change the underflow mode on entry, and on return shall ensure that the underflow mode is the same as it was on entry.
The initial underflow mode is processor dependent so you better hope that your compiler documents how to set this.

1 Like

Another example of this is hyperthreading. It is supposed to speed up codes by allowing concurrent access to the underlying functional units, but it sometimes slows down a code rather than speed it up.

This is a constraint on the processor, not on the programmer, right? A programmer might well change the underflow mode in W(X) (using the IEEE intrinsic subprogram to do so) on purpose. This is a side effect, but I don’t think it is a side effect that is otherwise prohibited within a pure function (in the way that i/o, modification of module variables, etc. are prohibited).

Here is another possibility. What if W(X) causes the process to get swapped from a fast core to a slow core? In the single statement case, W(X) could be executed first, causing both functions to be evaluated slowly, while in the second two-statement situation Z(X) would be executed on the fast core before W(X) does its thing. That is also a type of side effect that is not covered by the pure function restrictions.

Or how about this one. Suppose Z(X) does its thing entirely in level-1 cache, but W(X) thrashes the cache by accessing memory that was not already resident. Then if W(X) is executed first, it slows down both functions, but if Z(X) is executed first, it executes quickly before W(X) does its thing.

In any case, let me be clear, I’m just thinking about possible extreme situations here, not anything practical that one might expect to actually happen.

Correct. The programmer can change the underflow mode and a conforming compiler will restore it, on exit, to what it was on entry, so it cannot pollute other code.

There are many ways in which an optimizer can pick the sub-optimal solution. I am asking you to imagine an all-knowing compiler that always guesses correctly the fastest path. There is no way that Variant 2 can have the edge, whereas Variant 1 can. Variant 1 can always fall back to how Variant 2 is handled, but not the other way round.

Is this correct? I did not interpret that part of the standard that way.

I do not see an alternative interpretation. It is the same story with rounding mode and halting mode.

I don’t think it’s a good idea to go off-topic on this thread.

I deleted my posts here and moved it over to another thread.

What do you mean by this last sentence about subroutines? That the preceding text does not apply to them?

Well, it cannot, because the special treatment of functions described here is a consequence of there being potentially multiple function references in a single statement (even a single expression) but there can only be one subroutine reference per statement.

Not sure I agree with your answers!

I would have thought that the only case where Z(X) or W(X) would not be calculated, would be where they were previously (and recently) calculated. (no reference in the example)

Depending on the Fortran version, Z or W would have to be declared as PURE.
Prior to the introduction of PURE, I did not think that FORTRAN could have omitted the calculation of Z and W. It would not be a robust interpretation to omit the function call.
Impure functions have been used successfully for many years.

The other exclusion case I was considering, was if A was not used and so the calculation could have been totally omitted via optimisation.
Your reference to “separate execution resources and perform the calculation concurrently” is outside my understanding of the Fortran Standard. Where does the Standard reference this option ?

I think these questions are supposed to be hypothetical, not necessarily practical. In the original post, the compiler only knows the implicit interface for the functions, so such things as pure and elemental would not apply since they require explicit interfaces. Then in later discussions, it is assumed that the compiler is all-knowing and might be able to evaluate the function values without actually executing the functions. Maybe those assumptions, implicit interface and all-knowing, are inconsistent?

In the original post, the function values are added together. If instead they were multiplied together, so in one case the code is

A = Z(X) * W(X)

and

B = Z(X)
A = B * W(X)

in the other, then the question also includes the possibility that one or both of the function results is 0.0. In the first case, if the first function that is evaluated by the compiler is 0.0, then the expression can be evaluated without knowing the second function value, so the compiler is allowed to skip that evaluation. The functions need not be PURE in order for this optimization to occur. The only requirement is that side effects of one evaluation do not affect the evaluation of the other. An example of such a side affect is that the argument X is changed by one function, and that affects the other function value. This requirement is the responsibility of the programmer, so by putting both functions in the same expression, the programmer is telling the compiler that any side effects that might occur do not violate that requirement.

In the second code above, the compiler is still able to eliminate the W(X) evaluation if it knows that B is 0.0. However, I think in this case that Z(X) must always be evaluated. Also in this case, any and all side effects are allowed, including changing the argument X, or changing module variables that affect the function evaluation, and so in.

I’m uncertain about the case where A and/or B are unused, or if those varibles are immediately redefined. Compilers do sometimes eliminate the function evaluations in this situation, with aggressive optimization enabled, but I think it might be in violation of the standard; so from the programmer’s perspective, this is a nonstandard compiler extension.

Consider this situation:

B = Z(X)
B = Z(X)
B = 0.0

Is the second evaluation of Z(X) required to occur, or is the compiler allowed to copy the result of the first evaluation in the second assignment? If so, then any side effects only occur once, not twice. Then when the compiler sees that last statement, is it allowed to skip the Z(X) evaluations entirely? This would mean that no side effects would occur.

The “shall neither affect nor be affected” guarantees that concurrent execution is an allowed implementation choice.

The compiler is only constrained by observable behaviour. It does not have to factor itself in an analysis phase that precedes a calculation phase - it could do any analysis, at any time, even after the entire process is loaded, or after all the input has been read. We usually think of the Fortran compiler as a code generator that has to transform Z to something that can execute and then has nothing more to do, but the language spec does not.
If you consider the state of B to be observable by other software components, you mark it VOLATILE. If it’s not marked so, contiguous updates to B can be replaced with a single update. The question of side-effects for subprograms is addressed with PURE and the question of how the result depends on accessible entities by SIMPLE.
Theoretically, the compiler could see the snippet above (even without PURE/SIMPLE) and just generate code that defers the decision of what to calculate and what to update to runtime, when B, X and Z are completely known. Anything you or I can deduce, the compiler is also allowed to deduce.
An interesting case arises with the snippet above if B is associated with X and Z’s dummy is modified. That seems to fall foul of the very first rule mentioned in this thread, if you consider assignment an evaluation. Either way, it is not as clear-cut as I would wish. Compilers I have tried all produce results consistent with the side-effect of Z being completed before the assignment to B. So it may be conforming after all.

1 Like

This could be as simple as

X = B(X)

right? I think that is not allowed. That statement would be ambiguous because the value of X could be determined either by the modification of the dummy argument of B(X) or by the assignment of the function value. The standard had a choice of either defining which of those occurs, or just not allowing it to happen in the first place, and it took the second route.

There are some similar situations such as the array element assignment

A(I) = F(I)

with integer I that are also not allowed. That last case is used as an example in section 10.1.4 in the f2023 standard. The compiler is not required to detect this violation, it is the programmer’s responsibility to ensure that X is not changed within B(X), or that I is not changed within F(I). The functions B() and F() need not be PURE otherwise (e.g. they could perform i/o, or modify some module variable, etc.).

I think this is really the issue. Side effects are observable, so the question in these cases is whether the standard defines when and how often those side effects occur. The fortran standard has never been clear (to me) about this. It would be relatively simple to write language within the standard that did define the behavior, so the fact that this has not been done over the last five or six decades suggests that the behavior has been left ambiguous on purpose.

If you have a PDF version of f2023, then search for the terms “simultaneous” and “concurrent”. Those appear several times in the document in the context of expression evaluation, i/o operations, and parallel execution in DO CONCURRENT and in coarray codes.