When does Intel MKL perform faster than intrinsic functions?

Dear all,

I have tested some functions in Intel MKL using Intel OneAPI 2022.0.3, for arrays and matrices (at the level of about 10^9 elements). Such as MKL’s vdsin .vs. intrinsic sin, MKL’s dgemm and vdadd and domatadd vs intrinsic matmul and +.
However I found it seems MKL’s functions are slower than intrinsic functions.
A simple example shows MKL’s vdsin seems slower than just sin is below

On the other hand, I do found MKL’s gaussian random number generator is about 3 times faster than any other fastest random number generator I have ever used (such as the rnor() in ziggurat).

I just wonder, guys do you use Intel MKL?
When does Intel MKL perform faster than intrinsic functions?
What functions/subroutines in MKL are very fast?
Thanks much in advance!

PS. A sloppy illustration subroutine about how to use MKL’s gaussian random number is below , in the real code you may need to add a line before the main program, include 'mkl_vsl.f90' ,

  subroutine stochastic_rk_init(np,nd,nstep)
  use random
  USE MKL_VSL
  USE MKL_VSL_TYPE
  integer(kind=i8) :: np, nd, nstep, nsize
  integer(kind=4) errcode
  real(kind=8) mean,sigma
  integer :: brng,method,seed
  TYPE (VSL_STREAM_STATE) :: stream
  
  nsize = np*nd*4*nstep ! np is about 10^5, nstep is about 10^3, nd=2.
  if (allocated(normal_1d)) deallocate(normal_1d)   
  allocate(normal_1d(nsize)) 
  
      !brng=VSL_BRNG_MCG31  
      !brng=VSL_BRNG_MRG32k3a    
      !brng=VSL_BRNG_MT19937     
      brng=VSL_BRNG_SFMT19937
      method=VSL_RNG_METHOD_GAUSSIAN_ICDF
      seed=777  
      mean=zero
      sigma=one
      
!     ***** Initialize *****
      errcode=vslnewstream( stream, brng,  seed )  
!     ***** Call RNG *****
      errcode=vdrnggaussian( method, stream, int(nsize), normal_1d, mean, sigma)  
  
  !normal_1d = gaussian(nsize) ! rnor_vec(nsize)
  normal(1:np,1:nd,1:4,1:nstep) => normal_1d
  
  return
  end subroutine stochastic_rk_init