I get 4X speedup for free by just allowing @fastmath in front of g(i,N). This is partly because x^4 is calculated in Julia using Base.power_by_squaring instead of the much faster but less accurate x*x*x*x. You can test this if you replace x^4 with x*x^3, in this case, x is multiplied by the optimized x^3 calculated using Base.literal_pow algorithm. Being a loyal lover for Fortran, Julia still amazes me every day with its impressive performance.
Of course I tested my Intel compiler with the /fast flag to compare and tried the x*x**3 trick but nothing changes the timings of the Fortran version. Here are my benchmark results.
Intel Fortran:
time = 3.8906250000000000
time = 3.9062500000000000
time = 3.9062500000000000
val = 0.42737032509713474
Julia 1.7.0-beta2:
loop 1.140 s (0 allocations: 0 bytes) 0.4273703250971348
fast 1.140 s (0 allocations: 0 bytes) 0.4273703250971348
avx 1.512 s (0 allocations: 0 bytes) 0.42737032509704814
avxt 388.076 ms (0 allocations: 0 bytes) 0.4273703250970827
simd 1.140 s (0 allocations: 0 bytes) 0.4273703250971348
sumiter 3.135 s (0 allocations: 0 bytes) 0.4273703250971348
mapreduce 3.135 s (0 allocations: 0 bytes) 0.4273703250970799
threadsx.mapreduce 412.653 ms (620 allocations: 44.83 KiB) 0.42737032509707623