This is probably deliberate to keep a safer option as the default. What I miss there is that @fastmath
(or the compiler option in Fortran) should skip those checks, shouldn’t it? (it does not)
three_median2
and three_median1
(and associated assembly) is actually an incorrect implementation if you are following strict ieee rules. three_median2(0.0,-0.0,0.0)==-0.0
which and wrong, and both three_median1
and three_median2
produce NaN
for three_median2(0.0,NaN,0.0)==NaN
.
If you define three_median1
as @fastmath max(min(a,b),min(max(a,b),c))
, Julia is able to optimize to the same assembly as three_median2
Perfect. Thanks for the feedback @oscardssmith. @lmiq, do you want to try it if you now get the same performance with three_median1
?
Yes, it does:
julia> function three_median1(a, b, c)
res = @fastmath max(min(a,b),min(max(a,b),c))
return res
end
three_median1 (generic function with 1 method)
julia> @btime test($x,$three_median1)
10.053 μs (0 allocations: 0 bytes)
5017.136463388665
(I think I had put @fastmath
on the call to the function on the test
function, and that didn’t work)
one curiosity, in this last version I though I would do one comparison less:
julia> function three_median3(a,b,c)
a1, a2 = a < b ? (a, b) : (b, a)
a3 = a2 < c ? a2 : c
res = a3 > a1 ? a3 : a1
return res
end
three_median3 (generic function with 1 method)
but the native code is the same:
julia> @code_native three_median3(1.0,1.0,1.0)
.text
; ┌ @ REPL[32] within `three_median3'
vminsd %xmm1, %xmm0, %xmm3
vmaxsd %xmm0, %xmm1, %xmm0
; │ @ REPL[32]:3 within `three_median3'
vminsd %xmm2, %xmm0, %xmm0
; │ @ REPL[32] within `three_median3'
vmaxsd %xmm3, %xmm0, %xmm0
; │ @ REPL[32]:5 within `three_median3'
retq
nopw %cs:(%rax,%rax)
; └
(the compiler decided that swapping the values is not worth saving one comparison, something like that).
But, at the end, Fortran is being able to optimize the min/max
version to optimal, isn’t it? Does it behave like Julia in this regard (having NaN checks without fastmath
and skipping those with it)?
(I don’t know how to check the assembly codes in these cases).
You can check them with Godbolt:
Here is the latest Intel Fortran compiler Classic with -O3
:
three_median_:
movsd xmm2, QWORD PTR [rdi] #5.3
movsd xmm1, QWORD PTR [rsi] #5.3
movaps xmm0, xmm2 #7.1
maxsd xmm2, xmm1 #7.1
minsd xmm0, xmm1 #7.1
minsd xmm2, QWORD PTR [rdx] #7.1
maxsd xmm0, xmm2 #7.1
ret
And here is the gfortran
generated assembly from the link in @lkedward’s reply:
three_median_:
movsd xmm1, QWORD PTR [rdi]
movsd xmm2, QWORD PTR [rsi]
movapd xmm0, xmm1
minsd xmm1, xmm2
maxsd xmm0, xmm2
minsd xmm0, QWORD PTR [rdx]
maxsd xmm0, xmm1
ret
Nice tool. The version with the conditionals only seems to generate less instructions: Compiler Explorer
With conditionals only probably there is a small performance gain, although as far as I understood from the documentation the max/min
function is dealing with NaNs (I was expecting that --fast-math
or some other flag made the assemblies converge, but I couldn’t find that flag, if it exists).