@certik and everyone,

Notwithstanding the excellent point by @themos the specific case at hand is not a true performance benchmark on account of the mathematical identity, should we continue to use the same case but modify it as suggested upthread to view the `sin`

calculation as some representative function computation for which a common library implementation is to be consumed in any programming paradigm of interest, whether it be based in Julia or Fortran, and we “up” the game a little bit to achieve some “scale” to the calculations i.e., raise the number N to be 800,000,000 rather than 100,000,000, a speedup close to 6x relative to Julia is viable with Fortran.

So with John Burkardt’s C code for the CORDIC algorithm (link upthread) as the basis, the Julia code and the result is as follows (as noted by @uwe, `avx*`

macro fails to compile, so it’s removed):

```
function cordic_sine(a)
cos = Ref{Float64}(0.0)
sin = Ref{Float64}(0.0)
# Ref: Excellent work by John Burkardt
# https://people.sc.fsu.edu/~jburkardt/c_src/cordic/cordic.c
# void cossin_cordic ( double beta, int n, double *c, double *s )
ccall((:cossin_cordic, "cordic"), Cvoid, (Float64, Cint, Ref{Float64}, Ref{Float64}), a, 40, cos, sin)
return sin[]
end
function f_avx(N)
s = 0.0
for i in 1:N
s += cordic_sine(convert(Float64, i))
end
s
end
@time r = f_avx(800000000)
println(r)
```

Execution result with Julia:

C:\temp>julia avx.jl

172.551030 seconds (1 allocation: 16 bytes)

1.9006631241568408

With Fortran, say one pursues **rather simple-minded** `coarray`

approach as shown in the below fold - but while noting `coarray`

is part of the base language:

## Click to see Fortran code

####
Simple coarray example to divvy up the series summation across images

```
module cordic_m
use, intrinsic :: iso_c_binding, only : c_int, c_double
interface
pure subroutine cossin_cordic( beta, n, c, s) bind(C, name="cossin_cordic")
! Ref: Excellent work by John Burkardt
! https://people.sc.fsu.edu/~jburkardt/c_src/cordic/cordic.c
! void cossin_cordic ( double beta, int n, double *c, double *s )
import :: c_int, c_double
! Argument list
real(c_double), intent(in), value :: beta
integer(c_int), intent(in), value :: n
real(c_double), intent(inout) :: c
real(c_double), intent(inout) :: s
end subroutine
end interface
contains
pure function cordic_sine( a ) result(sin_a)
! Argument list
real(c_double), intent(in) :: a
! Function result
real(c_double) :: sin_a
real(c_double) :: cos_a
call cossin_cordic( a, 40, cos_a, sin_a )
end function
end module
use cordic_m
integer, parameter :: M = 100000000
real(c_double) :: r[*]
real(c_double) :: t1, t2
real(c_double) :: sn
integer :: i
if ( this_image() == 1 ) then
call cpu_time(t1)
end if
sync all
r = f(M)
sync all
if ( this_image() == 1 ) then
sn = 0.0_c_double
do i = 1, num_images()
sn = sn + r[i]
end do
call cpu_time(t2)
print *, "Time", t2-t1
print *, sn
end if
contains
pure function f(N) result(r)
integer, intent(in) :: N
real(c_double) :: r
integer :: i
r = 0.0_c_double
do i = (this_image()-1)*M+1, this_image()*M
r = r + cordic_sine(real(i,c_double))
end do
end function
end program
```

the timing result with IFORT on the same CPU as the above Julia case is:

C:\temp>ifort /c /O3 /Qcoarray:shared /Qcoarray-num-images=8 /QxHost /traceback p.f90

Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000

Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

C:\temp>link p.obj cordic.obj /subsystem:console /out:p.exe

Microsoft (R) Incremental Linker Version 14.27.29112.0

Copyright (C) Microsoft Corporation. All rights reserved.

C:\temp>p.exe

Time 30.2500000000000

1.90066312415625

Note the series summation result matches with that of Julia to the precision of `c_double`

/ `Float64`

.

Perhaps one or more of the Julia enthusiasts can show here the parallel execution option(s) with Julia for the same exact case i.e., their native option equivalent to Fortran `coarray`

- now that will be a good learning exercise for readers here!