Yes, parallel operations can synchronize/reduce in different orders and produce different results. Also, anything involving random numbers (random from run to run), such as monte carlo simulations, will produce different results. But this code discussed here does not involve parallel execution or random number simulations, and it isn’t just that the results are different, it is sometimes the code traps exceptions, sometimes it doesn’t, and sometimes the exceptions differ. So something odd seems to be happening, and it seems to be related to the fast-math option; whether it is a programmer error or a compiler error remains to be determined.
To understand the meaning of the assembly code more, I have just tried using an LLM about the code. Then, it says the assembly code is built for x86-64 (not for ARM64); is this correct…?
Also, do you possibly know how to generate such an assembly code on mac (ARM64)? I have tried -fverbose-asm and -emit-llvm, but I could not get a (text-based) assembly code like the above… (though I got a binary .bc file instead).
I’m sorry for the noise, I was able to get the assembly code with the -S option! (I imagined that I may need some special option for flang…) -S -emit-llvm also gave me an intermediate file with .ll, and -S -save-temps gave .mlir files.
I’ve tried installing valgrind on my mac (M1), but Homebrew failed with the message “valgrind: Linux is required for this software”… So I wonder if you used Linux (or possibly Windows) for the above test with valgrind…?
The most likely explaination for this is there are uninitialised variables/memory being used, rather than -ffast-math issues.
I agree that is usually the cause for this kind of behavior, but I did not see any uninitialized variables in the posted code. I only see literal constants that could compile to zero or to denormal floating point values. Those values are then used in expressions.
edit: Does anyone know what the standard says about initializations and assignments of literal constants that are smaller than tiny()? I think what I said above is correct, but I don’t know offhand the part of the standard that governs that behavior.
Hi Ron, I guess you could also try generating assembly codes or intermediate code and copy-paste them into some LLMs (with a prompt to explain the meaning line-by-line). I have tried a bit with ChatGPT, and it is pretty interesting to see the result (because I do not know how to read assembly etc…
) It might be even possible to ask about uninitialized variables at the assembly or intermediate level.
Also, the Github issue page (linked in the first post) has more information, with the last comment below:
Indeed I did.
I have reduced the reproducer further, it has nothing to do with denormal numbers.
use iso_fortran_env, only : RP => REAL32
implicit none
real(RP) :: a(14), b(14), c
a = 0.
b = a / maxval(a)
c = maxval(a)
print *, a/maxval(a), '|', b
print *, a/c, '|', b
end
But that’s just a different version of the same underlying issue - this new code only removed one step taken in the initial test case (the FTZ issue) and is simply performing an explicit 0./0. while telling the compiler to use optimizations which should assume there is no division by zero.
The optimizer is removing some of the ops since the compiler options were explicit in saying it is safe to do so. Thus we see the ops which would otherwise have filled the print buffer get removed - so you get whatever happened to be in memory.
I believe that the Standard requires the output field to be initialized to blanks:
If the number of characters produced by the editing is smaller than the field width, leading blanks are inserted in the field. (13.7.2.1 (p1) (c4) J3/26-007)
Is 0x3f9e0610 a valid REAL(KIND=4), or is that uninitialized memory due to a previous operation being deleted?
We could certainly calloc() the temporary in the specific location related to this particular issue so that the result is guaranteed not to be uninitialized memory. But this is still producing a wrong answer.
I believe flang is not under any Standard-based obligation to produce a better/different answer for this case. From quality-of-implementation POV, it’s also what you can expect when minimum requirements are violated at runtime. My concern is that not clearing the output field with blanks could lead to proper Standard violation in other cases. I believe I have another test case that demonstrates this.
To be clear, I’m talking about the new reduced test case that uses explicit 0. for a. The subnormal a = 7.E-45 is a separate issue.
What I’m looking at is the data buffer upon which the blank-padded output buffer will be operating on. The format and the data to be formatted are different; the output field needs to be blank padded, yes. But that’s orthogonal to the question of if the buffer to be consumed contains valid data.
570 movl $56, %edi
571 callq malloc@PLT
572 movq %rax, %rbp
573 movq %rax, 440(%rsp)
574 movq $4, 448(%rsp)
575 movq %r12, 456(%rsp)
576 movq $1, 464(%rsp)
577 movq $14, 472(%rsp)
578 movq $4, 480(%rsp)
579 leaq 440(%rsp), %rsi
580 movq %r13, %rdi
581 callq _FortranAioOutputDescriptor@PLT
At best, one could use calloc() here - but is 0. the correct answer or an indication something is wrong? Remember - 0x20202020 is a valid FP number as well, so you can’t just fill this buffer with “blanks” before (for example) passing it to mpfr for a translation resulting in a string of characters.
The compiler was requested y the user to perform an optimization which removed the lines located between 571 and 572 that would otherwise have populated the correct data into the data buffer - how does one detect that _FortranAioOutputDescriptor is being passed an uninitialized descriptor - other than enable fp traps (my preference) or use Valgrind?
Thank everyone for paying attention to this issue. Sorry I have been silent due to other business.
To avoid taking your time unnecessarily, I would like to let you know that the issue has been closed as completed by the LLVM maintainer:
To my understanding, the code and the optimization flags trigger undefined behaviour, so the result is not unexpected.
Many thanks to everyone again!