Simple summation 8x slower than in Julia

I get 4X speedup for free by just allowing @fastmath in front of g(i,N). This is partly because x^4 is calculated in Julia using Base.power_by_squaring instead of the much faster but less accurate x*x*x*x. You can test this if you replace x^4 with x*x^3, in this case, x is multiplied by the optimized x^3 calculated using Base.literal_pow algorithm. Being a loyal lover for Fortran, Julia still amazes me every day with its impressive performance.

Of course I tested my Intel compiler with the /fast flag to compare and tried the x*x**3 trick but nothing changes the timings of the Fortran version. Here are my benchmark results.

Intel Fortran:

 time =    3.8906250000000000     
 time =    3.9062500000000000     
 time =    3.9062500000000000     
 val =   0.42737032509713474

Julia 1.7.0-beta2:

loop      1.140    s (0 allocations: 0 bytes)  0.4273703250971348
fast      1.140    s (0 allocations: 0 bytes)  0.4273703250971348
avx       1.512    s (0 allocations: 0 bytes)  0.42737032509704814
avxt      388.076 ms (0 allocations: 0 bytes)  0.4273703250970827
simd      1.140    s (0 allocations: 0 bytes)  0.4273703250971348
sumiter   3.135    s (0 allocations: 0 bytes)  0.4273703250971348
mapreduce 3.135    s (0 allocations: 0 bytes)  0.4273703250970799
threadsx.mapreduce 412.653 ms (620 allocations: 44.83 KiB) 0.42737032509707623
3 Likes