Simple summation 8x slower than in Julia

Mason · May 8, 2021, 10:18pm

I can’t get this cordic program to compile on my machine so I can’t run my own comparisons. If you want to do multithreading, the simplest way would be to just write

function f_certainly_not_avx(N)
   s = 0.0
   Threads.@threads for i in 1:N
       s += cordic_sine(convert(Float64, i))
   end
   s
end

Not really sure why or how this is relevant to the original question that @certik asked though.

If one is willing to roll up their sleeves, they could make this cordic_sine function work with LoopVectorization.@avx, they just have to add dispatches that act correctly and vectorize across a VectorizationBase.AbstractSIMD type, but it wouldn’t be super simple.

Chris explained a bit on how to do so here: https://julialang.zulipchat.com/#narrow/stream/137791-general/topic/LoopVectorization.20vectorize.20an.20arbitrary.20function

Topic		Replies	Views
Julia: Fast as Fortran, Beautiful as Python	184	11794	November 13, 2022
Comparing Fortran and Julia's Bessel function performance	69	4865	October 23, 2022
What are the advantages of Julia over Fortran?	34	7745	January 19, 2022
Improving Fortran Results in the Julia Micro-benchmarks Help	44	3996	June 23, 2022
Julia vs. Fortran syntax	20	2185	August 20, 2021

Simple summation 8x slower than in Julia

Related topics