@certik and everyone,
Notwithstanding the excellent point by @themos the specific case at hand is not a true performance benchmark on account of the mathematical identity, should we continue to use the same case but modify it as suggested upthread to view the sin
calculation as some representative function computation for which a common library implementation is to be consumed in any programming paradigm of interest, whether it be based in Julia or Fortran, and we “up” the game a little bit to achieve some “scale” to the calculations i.e., raise the number N to be 800,000,000 rather than 100,000,000, a speedup close to 6x relative to Julia is viable with Fortran.
So with John Burkardt’s C code for the CORDIC algorithm (link upthread) as the basis, the Julia code and the result is as follows (as noted by @uwe, avx*
macro fails to compile, so it’s removed):
function cordic_sine(a)
cos = Ref{Float64}(0.0)
sin = Ref{Float64}(0.0)
# Ref: Excellent work by John Burkardt
# https://people.sc.fsu.edu/~jburkardt/c_src/cordic/cordic.c
# void cossin_cordic ( double beta, int n, double *c, double *s )
ccall((:cossin_cordic, "cordic"), Cvoid, (Float64, Cint, Ref{Float64}, Ref{Float64}), a, 40, cos, sin)
return sin[]
end
function f_avx(N)
s = 0.0
for i in 1:N
s += cordic_sine(convert(Float64, i))
end
s
end
@time r = f_avx(800000000)
println(r)
Execution result with Julia:
C:\temp>julia avx.jl
172.551030 seconds (1 allocation: 16 bytes)
1.9006631241568408
With Fortran, say one pursues rather simple-minded coarray
approach as shown in the below fold - but while noting coarray
is part of the base language:
Click to see Fortran code
Simple coarray example to divvy up the series summation across images
module cordic_m
use, intrinsic :: iso_c_binding, only : c_int, c_double
interface
pure subroutine cossin_cordic( beta, n, c, s) bind(C, name="cossin_cordic")
! Ref: Excellent work by John Burkardt
! https://people.sc.fsu.edu/~jburkardt/c_src/cordic/cordic.c
! void cossin_cordic ( double beta, int n, double *c, double *s )
import :: c_int, c_double
! Argument list
real(c_double), intent(in), value :: beta
integer(c_int), intent(in), value :: n
real(c_double), intent(inout) :: c
real(c_double), intent(inout) :: s
end subroutine
end interface
contains
pure function cordic_sine( a ) result(sin_a)
! Argument list
real(c_double), intent(in) :: a
! Function result
real(c_double) :: sin_a
real(c_double) :: cos_a
call cossin_cordic( a, 40, cos_a, sin_a )
end function
end module
use cordic_m
integer, parameter :: M = 100000000
real(c_double) :: r[*]
real(c_double) :: t1, t2
real(c_double) :: sn
integer :: i
if ( this_image() == 1 ) then
call cpu_time(t1)
end if
sync all
r = f(M)
sync all
if ( this_image() == 1 ) then
sn = 0.0_c_double
do i = 1, num_images()
sn = sn + r[i]
end do
call cpu_time(t2)
print *, "Time", t2-t1
print *, sn
end if
contains
pure function f(N) result(r)
integer, intent(in) :: N
real(c_double) :: r
integer :: i
r = 0.0_c_double
do i = (this_image()-1)*M+1, this_image()*M
r = r + cordic_sine(real(i,c_double))
end do
end function
end program
the timing result with IFORT on the same CPU as the above Julia case is:
C:\temp>ifort /c /O3 /Qcoarray:shared /Qcoarray-num-images=8 /QxHost /traceback p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
C:\temp>link p.obj cordic.obj /subsystem:console /out:p.exe
Microsoft (R) Incremental Linker Version 14.27.29112.0
Copyright (C) Microsoft Corporation. All rights reserved.
C:\temp>p.exe
Time 30.2500000000000
1.90066312415625
Note the series summation result matches with that of Julia to the precision of c_double
/ Float64
.
Perhaps one or more of the Julia enthusiasts can show here the parallel execution option(s) with Julia for the same exact case i.e., their native option equivalent to Fortran coarray
- now that will be a good learning exercise for readers here!