Why MKL's vdsin is slower than the intrinsic sin?

Dear all,

I notice from below thread,

That using MKL may speed up the computation, so I tried MKL’s vdsin function and using the same example as in the above thread,

!include "_rms.fi"
program avx
implicit none
!include "mkl_vml.f90"
integer, parameter :: dp = kind(0.d0)
real(dp) :: t1, t2, r

call cpu_time(t1)
r = f(100000000)
call cpu_time(t2)

print *, "Time", t2-t1
print *, r

contains

    real(dp) function f(N) result(r)
    integer, intent(in) :: N
    integer :: i
    real(dp) :: j(N)
    !r = 0.0_dp
    
    !j = 1.0_dp
    !do while (j<=N)
    !  r = r + sin(j)
    !  j = j + 1.0_dp
    !enddo
    
    !j = [(i, i = 1, N)]
     
    
    !r = sum( sin( dble( (/(i, i = 1,N)/) ) ) )
    
    
    
    call vdsin(N,(dble([(i,i=1,N)])),j)
    
    r = sum(j)
    
    !r = sum(sin(dble([(i,i=1,N)])))
    
    !do i = 1, N
    !    r = r + sin(dble(i))
    !end do
    
    
    return
    end function

end program

However, before using MKL it costs 0.6s, after using MKL it cost 1.3 second. My CPU is Xeon 2186M.

So I am confused, does anyone know how to use MKL correctly?

Thank much in advance!

PS.
I am using Intel OneAPI 2020.3 on Windows, setting is -O3 -xHost, and

The original code by @certik is

program avx
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: t1, t2, r

call cpu_time(t1)
r = f(100000000)
call cpu_time(t2)

print *, "Time", t2-t1
print *, r

contains

    real(dp) function f(N) result(r)
    integer, intent(in) :: N
    integer :: i
    r = 0
    do i = 1, N
        r = r + sin(real(i,dp))
    end do
    end function

end program

Isn’t it better to create the above input array before entering this part

call cpu_time(t1)
r = f(100000000)
call cpu_time(t2)

and pass the array to f() so that it measures purely the time for evaluating sin?

1 Like

Thanks @septc ! Yeah, uhm, well I just try to do the same thing as @certik did in that thread, his code is,

program avx
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: t1, t2, r

call cpu_time(t1)
r = f(100000000)
call cpu_time(t2)

print *, "Time", t2-t1
print *, r

contains

    real(dp) function f(N) result(r)
    integer, intent(in) :: N
    integer :: i
    r = 0
    do i = 1, N
        r = r + sin(real(i,dp))
    end do
    end function

end program

If I do

r = sum(sin(dble([(i,i=1,N)])))

it is two times faster than using MKL’s vdsin as below

    call vdsin(N,(dble([(i,i=1,N)])),j)
    r = sum(j)