You come across this Fortran line, where all names are default real non-pointer, non-allocatable scalars (i.e. this could have come from the 1950s).
X = (A*B)+C
What is your interpretation of the author’s intention?
The RHS is equivalent to A*B+C, the processor is free to compute the result with two arithmetic operations and two roundings, or with a fused multiply-add operation with one rounding. The author did not intend to constrain that choice.
The RHS is equivalent to IEEE_REAL(A*B)+C and must be computed with two roundings. The author intended to constrain the processor.
0voters
What do you think the current Fortran Standard specifies?
The RHS is equivalent to A*B+C, the processor is free to compute the result with two arithmetic operations and two roundings, or with a fused multiply-add operation with one rounding.
The RHS is equivalent to IEEE_REAL(A*B)+C and must be computed with two roundings.
I didn’t vote, as neither option represents what I think. When I see code like this, it might or might not be intentional, and the author might or might not want to let the compiler use, say, fma for fast execution. So both options are correct at the same time, if it makes sense.
If I am the author, I usually want the most aggressive optimization, so fma in the above case, unless there is a numerical issue, in which case I don’t: I need some kind of “escape hatch”. It’s very rare however to need an escape hatch, but it does happen.
Depending on the underlying hardware, the fused multiply-add operation can be either more or less accurate than the separate instructions. For example, back in the 80-bit register days, the fused operation was often less accurate than the separate operations. I don’t think the fortran syntax can be used to distinguish either of those two cases. I think the only thing the programmer can expect is that the fused operation is faster than the separate operations (or at least not slower). Of course the expression IEEE_REAL(A*B)+C, or breaking up the expression over separate statements, is not equivalent. So I would say that expressions A*B+C and (A*B)+C are exactly the same to the compiler. The compiler is free to use the fused operation, or vector instructions, or GPU instructions, etc. to evaluate either of those expressions.
Are we discussing the order of calculation ? How well should the compiler recognise these problems ?
What about the intrinsic SUM. Should the compiler be able to recognise the associated round-off problems ?
In some cases, the order of accumulation can be significant, although this is not always easy to assess.
The following example can show a difference, but is a rare example.
do i = 1,n
s = s + 1.d0/ dble (i)
end do
vs
do i = n,1,-1
s = s + 1.d0/ dble (i)
end do
In the past, where this might be an issue, the higher precision real(10) accumulator did provide increased precision for this calculation, although this was often lost in associated real(8) calculations, such as large matrix reduction. ( it has been 40 years since i did these tests implementing a real(10) solver )
From your post to the J3 list, it seems that that the compiler is free to evaluate any mathematically equivalent expression, and parentheses don’t control order of operations. But it was concerning for our group, which puts tremendous effort in preserving the bit reproducibility of our forecasts.
For us, there is tremendous value in using parentheses to control the order of operations. We use a+b+c when any order would be suitable, and (a+b)+c when we must first compute a+b. Similarly, we use a*b+c when an FMA instruction is fine, but (a*b)+c when we must control the order.
So far, every compiler that we have used follows the order of operations as prescribed by parentheses. We have also found (a*b)+c is an effective method to prevent FMA. (We have so far found only one exception which forces FMAs, and only when the target is a GPU.)
If compiler developers were to shift to the more lenient interpretation, then I suppose there’s nothing much we could say in objection. We still have ieee_real(), but it’s bulky and does not lend itself to readability. At the least, it would make things more difficult.
This would be correct with my interpretation of the fortran semantics. However, I’m not 100% certain that my interpretation is correct. Perhaps this question should be forwarded to the standard committee for an official interpretation.
Fortran is required to respect parentheses, but these two expressions A*B+C and (A*B)+C, both have the same expression order, so the parentheses are redundant. I think a compiler is free to use any instruction and/or any hardware to evaluate it.
I would say that this is the correct solution to your problem. It is a little verbose, but it 100% conveys to a compiler and to a human reader what is supposed to occur in the expression. I’m unsure how efficient this would be. A compiler could generate an actual function reference, which would slow down the code, or if it is already using IEEE conventions for rounding, it could optimize away the function reference. In that last case, I don’t think it could then proceed to use the fused operation unless it knows that the fused operation would give the same result. The same considerations would apply to
T = A * B
T = T + C
I don’t think a compiler is allowed to use extended precision to hold the intermediate T, or to use a fused instruction to evaluate both statements. But I’m not 100% sure of all that.
Fortran is a high-level language, so it is not possible to control everything down to the instruction level of the hardware. Maybe others know the answer to this part of the question.
F2023 10.1.5.2.4 paragraph 2 says “Once the interpretation of a numeric intrinsic operation is established, the processor may evaluate any mathematically equivalent expression, provided that the integrity of parentheses is not violated.”
And Note 1 says “Any difference between the values of the expressions (1./3.)*3. and 1. is a computational difference, not a mathematical difference. The difference between the values of the expressions 5/2 and 5./2. is a mathematical difference, not a computational difference.”
Given the paragraph cited by @Harper, I agree that the compiler will evaluate that expression according to the floating point model, fp-model in ifx. Hence, to guess what will happen the developer must not only look at the source file, but also to the cmake script.
A+B+C lets the compiler decide the order (any one of three) (A+B)+C and A+(B+C) constrain the compiler to one of three
I think this is well understood by Fortran users.
Well, the standard seems to allow mathematically equivalent rearrangements, so the parentheses may be taken for granted. And I know that the Intel compiler will actually do so, unless you ask it to respect them.
My preferred approach, going forward, would be to introduce some extra named constants in IEEE_FEATURES, the accessibility of which would act as pragma to the compiler. Imagine an IEEE_FOO named constant (a better name needed, that conveys the idea that parentheses enforce a rounding). Whenever that name is accessible in a scope, parentheses enforce a rounding, and consequently inhibit FMA. Whenever it is not, compiler can be liberal.
I think that might work well with your conventions or code style.
I have framed the issue as one of user expectation and standard prescription. I think those two are independent of quality of implementation, which is how best to serve current and future authors, readers and users.
Thanks for considering this use case, even if its an uncommon one. I agree that it could work, and I’m looking forward to see what you have in mind.
Another option that is less targeted, but perhaps more convenient, is something like implicit ieee_rounding in the specification, which could force IEEE rounding of anything within parentheses, and applied to the entire program unit.
The semantics of the rest of the IMPLICIT statement does not really fit well with this purpose, it seems to me. Treating accessibility of particular names in intrinsic modules as pragmas is, to my mind, preferable because it is quite straightforward to apply in the large and in the small.
Since you are here, I would like to know if you agree with M Cohen’s interpretation, as stated in [J3] Meaning of "A+(B*C)"
Your message at J3 has been taken by some people as dissent, and by others as agreement. A clarification would be helpful.
We already know that some users would like to constrain the compiler’s interpretation of the form and there is agreement that this is a useful mode of operation. And the same goes for the opposite view. A classic case of “exception” (no matter which choice you make, some people will take exception). The question remaining is “is it a constraint based on the Standard specification, or an extra-Standard useful mode that could be offered as a processor-dependent feature (and might be standardised in the future)”