Simple summation 8x slower than in Julia

I can’t get this cordic program to compile on my machine so I can’t run my own comparisons. If you want to do multithreading, the simplest way would be to just write

function f_certainly_not_avx(N)
   s = 0.0
   Threads.@threads for i in 1:N
       s += cordic_sine(convert(Float64, i))
   end
   s
end

Not really sure why or how this is relevant to the original question that @certik asked though.

If one is willing to roll up their sleeves, they could make this cordic_sine function work with LoopVectorization.@avx, they just have to add dispatches that act correctly and vectorize across a VectorizationBase.AbstractSIMD type, but it wouldn’t be super simple.

Chris explained a bit on how to do so here: https://julialang.zulipchat.com/#narrow/stream/137791-general/topic/LoopVectorization.20vectorize.20an.20arbitrary.20function