I get 4X speedup for free by just allowing @fastmath
in front of g(i,N)
. This is partly because x^4
is calculated in Julia using Base.power_by_squaring
instead of the much faster but less accurate x*x*x*x
. You can test this if you replace x^4
with x*x^3
, in this case, x
is multiplied by the optimized x^3
calculated using Base.literal_pow
algorithm. Being a loyal lover for Fortran, Julia still amazes me every day with its impressive performance.
Of course I tested my Intel compiler with the /fast
flag to compare and tried the x*x**3
trick but nothing changes the timings of the Fortran version. Here are my benchmark results.
Intel Fortran:
time = 3.8906250000000000
time = 3.9062500000000000
time = 3.9062500000000000
val = 0.42737032509713474
Julia 1.7.0-beta2:
loop 1.140 s (0 allocations: 0 bytes) 0.4273703250971348
fast 1.140 s (0 allocations: 0 bytes) 0.4273703250971348
avx 1.512 s (0 allocations: 0 bytes) 0.42737032509704814
avxt 388.076 ms (0 allocations: 0 bytes) 0.4273703250970827
simd 1.140 s (0 allocations: 0 bytes) 0.4273703250971348
sumiter 3.135 s (0 allocations: 0 bytes) 0.4273703250971348
mapreduce 3.135 s (0 allocations: 0 bytes) 0.4273703250970799
threadsx.mapreduce 412.653 ms (620 allocations: 44.83 KiB) 0.42737032509707623