Simple summation 8x slower than in Julia

@certik and everyone,

Notwithstanding the excellent point by @themos the specific case at hand is not a true performance benchmark on account of the mathematical identity, should we continue to use the same case but modify it as suggested upthread to view the sin calculation as some representative function computation for which a common library implementation is to be consumed in any programming paradigm of interest, whether it be based in Julia or Fortran, and we “up” the game a little bit to achieve some “scale” to the calculations i.e., raise the number N to be 800,000,000 rather than 100,000,000, a speedup close to 6x relative to Julia is viable with Fortran.

So with John Burkardt’s C code for the CORDIC algorithm (link upthread) as the basis, the Julia code and the result is as follows (as noted by @uwe, avx* macro fails to compile, so it’s removed):

function cordic_sine(a)

   cos = Ref{Float64}(0.0)
   sin = Ref{Float64}(0.0)

   # Ref: Excellent work by John Burkardt
   # https://people.sc.fsu.edu/~jburkardt/c_src/cordic/cordic.c
   # void cossin_cordic ( double beta, int n, double *c, double *s )
   ccall((:cossin_cordic, "cordic"), Cvoid, (Float64, Cint, Ref{Float64}, Ref{Float64}), a, 40, cos, sin)

   return sin[]

end

function f_avx(N)
   s = 0.0
   for i in 1:N
       s += cordic_sine(convert(Float64, i))
   end
   s
end

@time r = f_avx(800000000)
println(r)

Execution result with Julia:

C:\temp>julia avx.jl
172.551030 seconds (1 allocation: 16 bytes)
1.9006631241568408

With Fortran, say one pursues rather simple-minded coarray approach as shown in the below fold - but while noting coarray is part of the base language:

Click to see Fortran code

Simple coarray example to divvy up the series summation across images

module cordic_m
   use, intrinsic :: iso_c_binding, only : c_int, c_double
   interface
      pure subroutine cossin_cordic( beta, n, c, s) bind(C, name="cossin_cordic")
      ! Ref: Excellent work by John Burkardt
      ! https://people.sc.fsu.edu/~jburkardt/c_src/cordic/cordic.c
      ! void cossin_cordic ( double beta, int n, double *c, double *s )
         import :: c_int, c_double
         ! Argument list
         real(c_double), intent(in), value :: beta
         integer(c_int), intent(in), value :: n
         real(c_double), intent(inout)     :: c 
         real(c_double), intent(inout)     :: s 
      end subroutine
   end interface
contains
   pure function cordic_sine( a ) result(sin_a)
      ! Argument list
      real(c_double), intent(in) :: a
      ! Function result
      real(c_double) :: sin_a
      real(c_double) :: cos_a
      call cossin_cordic( a, 40, cos_a, sin_a )
   end function
end module
   use cordic_m
   integer, parameter :: M = 100000000
   real(c_double) :: r[*]
   real(c_double) :: t1, t2
   real(c_double) :: sn
   integer :: i

   if ( this_image() == 1 ) then
      call cpu_time(t1)
   end if
   sync all
       
   r = f(M)
   sync all

   if ( this_image() == 1 ) then
      sn = 0.0_c_double
      do i = 1, num_images()
         sn = sn + r[i]
      end do
      call cpu_time(t2)
      print *, "Time", t2-t1
      print *, sn
   end if

contains

   pure function f(N) result(r)
      integer, intent(in) :: N
      real(c_double) :: r
      integer :: i
      r = 0.0_c_double
      do i = (this_image()-1)*M+1, this_image()*M
         r = r + cordic_sine(real(i,c_double))
      end do
   end function

end program 

the timing result with IFORT on the same CPU as the above Julia case is:

C:\temp>ifort /c /O3 /Qcoarray:shared /Qcoarray-num-images=8 /QxHost /traceback p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

C:\temp>link p.obj cordic.obj /subsystem:console /out:p.exe
Microsoft (R) Incremental Linker Version 14.27.29112.0
Copyright (C) Microsoft Corporation. All rights reserved.

C:\temp>p.exe
Time 30.2500000000000
1.90066312415625

Note the series summation result matches with that of Julia to the precision of c_double / Float64.

Perhaps one or more of the Julia enthusiasts can show here the parallel execution option(s) with Julia for the same exact case i.e., their native option equivalent to Fortran coarray - now that will be a good learning exercise for readers here!

3 Likes