Why heap array make my code 6 times slower?

Dear all, I am using Intel Fortran OneAPI,
I have a code will show you below, which is just a very very very raw my initial version of stochastic_rk which is based on John Burkardt’s code. John’s version is scaler version can solve 1 SDE. I just made it can slove 2 SDE, it can be extend to solve any number of SDEs. Again, it is a raw version. I have an improved version which I put the random number generator outside the loop so it is faster.

Now it is actually another question. See the code below. You can directly compile and run it.
Thing is, if I enable ‘heap array’, then the code is slow, like 8-9 seconds.
in windows, flag is
/heap-arrays0
linux is,
-heap-arrays

However, If I do not heap array, it is only 1.5 seconds.
My questions, why there is such a difference? What caused the issue?

Besides, do you guys use ‘heap-arrays’ option? If so, how to you set the size.
I do feel heap-arrays is very useful. Enabling it just make the code free from stackoverflow.
I usally use allocatable arrays, so heap-arrays or not, does not seem to have big differences.

However, this time because my code is basically build from John’s test code, so I did not allocate arrays often, and I do notice there is a big performance difference between heap and not heap arrays.

I know it is very lazy question, but I would like to hear your insights and experiences and learn from you, thank you very much in advance!

module random2
   implicit none
   integer, private, parameter :: i4=selected_int_kind(9)
   integer, private, parameter :: i8=selected_int_kind(15)
   integer, private, parameter :: r8=selected_real_kind(15,9)

contains
   subroutine ran1(rn,irn)
! multiplicative congruential with additive constant
   integer(kind=i8),  parameter :: mask24 = ishft(1_8,24)-1
   integer(kind=i8),  parameter :: mask48 = ishft(1_8,48_8)-1_8
   real(kind=r8),  parameter :: twom48=2.0d0**(-48)
   integer(kind=i8),  parameter :: mult1 = 44485709377909_8
   integer(kind=i8),  parameter :: m11 = iand(mult1,mask24)
   integer(kind=i8),  parameter :: m12 = iand(ishft(mult1,-24),mask24)
   integer(kind=i8),  parameter :: iadd1 = 96309754297_8
   integer(kind=i8) :: irn
   real(kind=r8) :: rn
   integer(kind=i8) :: is1,is2
   is2 = iand(ishft(irn,-24),mask24)
   is1 = iand(irn,mask24)
   irn = iand(ishft(iand(is1*m12+is2*m11,mask24),24)+is1*m11+iadd1,mask48)
   rn = ior(irn,1_8)*twom48
   return
   end subroutine ran1

   subroutine ran2(rn,irn)
! multiplicative congruential with additive constant
   integer(kind=i8),  parameter :: mask24 = ishft(1_8,24)-1
   integer(kind=i8), parameter :: mask48 = ishft(1_8,48)-1
   real(kind=r8), parameter :: twom48=2.0d0**(-48)
   integer(kind=i8), parameter :: mult2 = 34522712143931_8
   integer(kind=i8), parameter :: m21 = iand(mult2,mask24)
   integer(kind=i8), parameter :: m22 = iand(ishft(mult2,-24),mask24)
   integer(kind=i8), parameter :: iadd2 = 55789347517_8
   integer(kind=i8) :: irn
   real(kind=r8) :: rn
   integer(kind=i8) :: is1,is2
   is2 = iand(ishft(irn,-24),mask24)
   is1 = iand(irn,mask24)
   irn = iand(ishft(iand(is1*m22+is2*m21,mask24),24)+is1*m21+iadd2,mask48)
   rn = ior(irn,1_8)*twom48
   return
   end subroutine ran2

end module random2

module random
   implicit none
   integer, private, parameter :: i4=selected_int_kind(9)
   integer, private, parameter :: i8=selected_int_kind(15)
   integer, private, parameter :: r8=selected_real_kind(15,9)
   integer(kind=i8), private, parameter :: mask48 = ishft(1_8,48)-1
   integer(kind=i8), private, save :: irn = 1_8,irnsave

contains   
   function randn(n)
! return an array of random variates (0,1)
   use random2
   integer(kind=i8) :: n
   real(kind=r8), dimension(n) :: randn
   integer(kind=i8) :: i
   do i=1,n
      call ran1(randn(i),irn)
   enddo
   return
   end function randn

   function gaussian(n)
! return an array of gaussian random variates with variance 1
   integer(kind=i8) :: n
   real(kind=r8), dimension(n) :: gaussian
   real(kind=r8), dimension(2*((n+1)/2)) :: rn
   real(kind=r8) :: rn1,rn2,arg,x1,x2,x3
   real(kind=r8), parameter :: pi=4.0d0*atan(1.0_r8)
   integer(kind=i8) :: i
   rn=randn(size(rn,kind=i8))
   do i=1,n/2
      rn1=rn(2*i-1)
      rn2=rn(2*i)
      arg=2.0_r8*pi*rn1
      x1=sin(arg)
      x2=cos(arg)
      x3=sqrt(-2.0_r8*log(rn2))
      gaussian(2*i-1)=x1*x3
      gaussian(2*i)=x2*x3
   enddo
   if (mod(n,2).ne.0) then
      rn1=rn(n)
      rn2=rn(n+1)
      arg=2.0_r8*pi*rn1
      x2=cos(arg)
      x3=sqrt(-2.0_r8*log(rn2))
      gaussian(n)=x2*x3
      endif
   return
   end function gaussian

   subroutine setrn(irnin)
! set the seed
   integer(kind=i8) :: irnin
   integer(kind=i8) :: n
   integer(kind=i8), allocatable :: iseed(:)
   irn=iand(irnin,mask48)
   call savern(irn)
   call RANDOM_SEED(SIZE=n)
   allocate(iseed(n))
   iseed=int(irnin)
   call RANDOM_SEED(PUT=iseed) ! set the internal ran function's seed.
   return
   end subroutine setrn

   subroutine savern(irnin)
! save the seed
   integer(kind=i8) :: irnin
   irnsave=irnin
   return
   end subroutine savern

   subroutine showirn(irnout)
! show the seed
   integer(kind=i8) :: irnout
   irnout=irnsave
   return
   end subroutine showirn   

   function randomn(n) ! the default ramdom_number subroutine.
! return an array of random variates (0,1)
   integer(kind=i8) :: n
   real(kind=r8), dimension(n) :: randomn
   call RANDOM_NUMBER(randomn)
   return
   end function randomn

   subroutine random1(x) ! generate random number x using the default ramdom_number subroutine.
   real(kind=r8) :: x
   call RANDOM_NUMBER(x)
   return
   end subroutine random1

end module random
module constants
    implicit none
    integer, public, parameter :: i4=selected_int_kind(9)
    integer, public, parameter :: i8=selected_int_kind(15)
    integer, public, parameter :: r8=selected_real_kind(15,9)
    real(kind=r8), public, parameter :: zero=0.0_r8,one=1.0_r8,two=2.0_r8,three=3.0_r8,four=4.0_r8 &
	    ,five=5.0_r8,six=6.0_r8,seven=7.0_r8,eight=8.0_r8,nine=9.0_r8 &
	    ,ten=10.0_r8,tenth=.1_r8,half=.5_r8,third=1.0_r8/3.0_r8,sixth=1.0_r8/6.0_r8 &
	    ,pi=4.0_r8*atan(1.0_r8) &
	    ,normpdf_factor=1.0_r8/sqrt(8.0_r8*atan(1.0_r8)) ! 1/sqrt(2 pi)
    complex(kind=r8), public, parameter :: czero=(0.0_r8,0.0_r8),ci=(0.0_r8,1.0_r8),cone=(1.0_r8,0.0_r8)  
end module constants
module stochastic_rk
use constants  
implicit none
contains    
subroutine rk4_ti_step_mod ( x, t, h, q, fi, gi, seed, xstar )
use random

!*****************************************************************************80
!
!! RK4_TI_STEP takes one step of a stochastic Runge Kutta scheme.
!
!  Discussion:
!
!    The Runge-Kutta scheme is fourth-order, and suitable for time-invariant
!    systems in which F and G do not depend explicitly on time.
!
!    d/dx X(t,xsi) = F ( X(t,xsi) ) + G ( X(t,xsi) ) * w(t,xsi)
!
!  Licensing:
!
!    This code is distributed under the GNU LGPL license.
!
!  Modified:
!
!    20 June 2010
!
!  Author:
!
!    John Burkardt
!
!  Reference:
!
!    Jeremy Kasdin,
!    Runge-Kutta algorithm for the numerical integration of
!    stochastic differential equations,
!    Journal of Guidance, Control, and Dynamics,
!    Volume 18, Number 1, January-February 1995, pages 114-120.
!
!    Jeremy Kasdin,
!    Discrete Simulation of Colored Noise and Stochastic Processes
!    and 1/f^a Power Law Noise Generation,
!    Proceedings of the IEEE,
!    Volume 83, Number 5, 1995, pages 802-827.
!
!  Parameters:
!
!    Input, real ( kind = 8 ) X, the value at the current time.
!
!    Input, real ( kind = 8 ) T, the current time.
!
!    Input, real ( kind = 8 ) H, the time step.
!
!    Input, real ( kind = 8 ) Q, the spectral density of the input white noise.
!
!    Input, external real ( kind = 8 ) FI, the name of the deterministic
!    right hand side function.
!
!    Input, external real ( kind = 8 ) GI, the name of the stochastic
!    right hand side function.
!
!    Input/output, integer ( kind = 4 ) SEED, a seed for the random 
!    number generator.
!
!    Output, real ( kind = 8 ) XSTAR, the value at time T+H.
!
  implicit none

  real ( kind = r8 ), external :: fi
  real ( kind = r8 ), external :: gi
  real ( kind = r8 ) h
  real ( kind = r8 ) k1
  real ( kind = r8 ) k2
  real ( kind = r8 ) k3
  real ( kind = r8 ) k4
  real ( kind = r8 ) q
  real ( kind = r8 ) r8_normal_01
  integer ( kind = i8 ) seed
  real ( kind = r8 ) t
  real ( kind = r8 ) t1
  real ( kind = r8 ) t2
  real ( kind = r8 ) t3
  real ( kind = r8 ) t4
  real ( kind = r8 ) w1
  real ( kind = r8 ) w2
  real ( kind = r8 ) w3
  real ( kind = r8 ) w4
  real ( kind = r8 ) x
  real ( kind = r8 ) x1
  real ( kind = r8 ) x2
  real ( kind = r8 ) x3
  real ( kind = r8 ) x4
  real ( kind = r8 ) xstar

  real ( kind = r8 ) :: qoh
  real ( kind = r8 ) :: normal(4)

  real ( kind = r8 ), parameter :: a21 = 2.71644396264860_r8 &
  ,a31 = - 6.95653259006152_r8 &
  ,a32 =   0.78313689457981_r8 &
  ,a41 =   0.0_r8 &
  ,a42 =   0.48257353309214_r8 &
  ,a43 =   0.26171080165848_r8 &
  ,a51 =   0.47012396888046_r8 &
  ,a52 =   0.36597075368373_r8 &
  ,a53 =   0.08906615686702_r8 &
  ,a54 =   0.07483912056879_r8

  real ( kind = r8 ), parameter, dimension(4) :: qarray = [ 2.12709852335625_r8 &
      ,2.73245878238737_r8 &
      ,11.22760917474960_r8 &
      ,13.36199560336697_r8 ]

  real ( kind = r8 ) :: warray(4)
  integer (kind = 4) :: i

  qoh = q / h
  normal = gaussian(4)

  do i =1,4
      warray(i) = normal(i)*sqrt(qarray(i)*qoh)
  enddo

  !t1 = t
  x1 = x
  k1 = h * ( fi ( x1 ) + gi ( x1 ) * warray(1) ) 

  !t2 = t1 + a21 * h
  x2 = x1 + a21 * k1
  k2 = h * ( fi ( x2 ) + gi ( x2 ) * warray(2) )

  !t3 = t1 + ( a31  + a32 )* h
  x3 = x1 + a31 * k1 + a32 * k2
  k3 = h * ( fi ( x3 ) + gi ( x3 ) * warray(3) )

  !t4 = t1 + ( a41  + a42 + a43 ) * h
  x4 = x1 + a41 * k1 + a42 * k2 + a43 * k3 ! the a43 * k3 seems need to be added.
  k4 = h * ( fi ( x4 ) + gi ( x4 ) * warray(4) )

  xstar = x1 + a51 * k1 + a52 * k2 + a53 * k3 + a54 * k4

return
end subroutine rk4_ti_step_mod

subroutine rk4_ti_step_vec_mod ( x, t, h, q, fi, gi, n, xstar )
    use random
    implicit none
    interface				   !! Declare function        
	    function fi ( x ) result (f)
	    use constants
	    real ( kind = r8 ) :: f(2)
	    real ( kind = r8 ) :: x(2)
	    end function
	    function gi ( x ) result (g)
	    use constants
	    real ( kind = r8 ) :: g(2)
	    real ( kind = r8 ) :: x(2)
	    end function      
    end interface
    integer ( kind = i8 ) n
    real ( kind = r8 ) h
    real ( kind = r8 ) k1(n)
    real ( kind = r8 ) k2(n)
    real ( kind = r8 ) k3(n)
    real ( kind = r8 ) k4(n)
    real ( kind = r8 ) q
    real ( kind = r8 ) r8_normal_01

    real ( kind = r8 ) t
    real ( kind = r8 ) t1
    real ( kind = r8 ) t2
    real ( kind = r8 ) t3
    real ( kind = r8 ) t4
    real ( kind = r8 ) w1
    real ( kind = r8 ) w2
    real ( kind = r8 ) w3
    real ( kind = r8 ) w4
    real ( kind = r8 ) x(n)
    real ( kind = r8 ) x1(n)
    real ( kind = r8 ) x2(n)
    real ( kind = r8 ) x3(n)
    real ( kind = r8 ) x4(n)
    real ( kind = r8 ) xstar(n)

    real ( kind = r8 ), parameter :: a21 = 2.71644396264860_r8 &
    ,a31 = - 6.95653259006152_r8 &
    ,a32 =   0.78313689457981_r8 &
    ,a41 =   0.0_r8 &
    ,a42 =   0.48257353309214_r8 &
    ,a43 =   0.26171080165848_r8 &
    ,a51 =   0.47012396888046_r8 &
    ,a52 =   0.36597075368373_r8 &
    ,a53 =   0.08906615686702_r8 &
    ,a54 =   0.07483912056879_r8

    real ( kind = r8 ), parameter, dimension(4) :: qs = [ 2.12709852335625_r8 &
	    ,2.73245878238737_r8 &
	    ,11.22760917474960_r8 &
	    ,13.36199560336697_r8 ]
    real ( kind = r8 ), parameter, dimension(2) :: aaa = [one,one]

    real ( kind = r8 ) :: qoh
    real ( kind = r8 ) :: warray(n,4)
    integer (kind = i8) :: i

    qoh = q / h

    do i =1,4
	    warray(:,i) = gaussian(n)*sqrt(qs(i)*qoh)
    enddo

    !t1 = t
    x1 = x
    k1 = h * ( fi ( x1 ) + gi ( x1 ) * warray(:,1) ) 

    !t2 = t1 + a21 * h
    x2 = x1 + a21 * k1
    k2 = h * ( fi ( x2 ) + gi ( x2 ) * warray(:,2) )

    !t3 = t1 + ( a31  + a32 )* h
    x3 = x1 + a31 * k1 + a32 * k2
    k3 = h * ( fi ( x3 ) + gi ( x3 ) * warray(:,3) )

    !t4 = t1 + ( a41  + a42 + a43 ) * h
    x4 = x1 + a41 * k1 + a42 * k2 + a43 * k3 ! the a43 * k3 seems need to be added.
    k4 = h * ( fi ( x4 ) + gi ( x4 ) * warray(:,4) )

    xstar = x1 + a51 * k1 + a52 * k2 + a53 * k3 + a54 * k4

return
end subroutine rk4_ti_step_vec_mod


    subroutine rk4_ti_step_vec_mod1 ( x, t, h, q, fi_vec, gi_vec, n, xstar )
	    use random
	    implicit none
	    interface				   !! Declare function        
		    function fi_vec ( x ) result (f)
		    use constants
		    real ( kind = r8 ) :: f(2)
		    real ( kind = r8 ) :: x(2)
		    end function
		    function gi_vec ( x ) result (g)
		    use constants
		    real ( kind = r8 ) :: g(2)
		    real ( kind = r8 ) :: x(2)
		    end function      
	    end interface       
	    integer(kind = i8), intent(in) :: n
	    real(kind=r8), intent(in) :: x(n)
	    real(kind = r8), intent(in) :: t  
	    real(kind = r8), intent(in) :: h
	    real(kind = r8), intent(in) :: q
	    !real(kind = r8), external :: fi_vec
	    !real(kind = r8), external :: gi_vec
	    real(kind = r8), intent(out) :: xstar(n)
	    real ( kind = r8 ), parameter :: alphas(4) = [ 0.47012396888046_r8 &
	    ,0.36597075368373_r8 &
	    ,0.08906615686702_r8 &
	    ,0.07483912056879_r8 ]
	    real(kind = r8), parameter :: as(4,4) = reshape([ &
		    &  0.0_r8,              0.0_r8,              0.0_r8,              0.0_r8, &
		    &  2.71644396264860_r8, 0.0_r8,              0.0_r8,              0.0_r8, &
		    & -6.95653259006152_r8, 0.78313689457981_r8, 0.0_r8,              0.0_r8, &
		    &  0.0_r8,              0.48257353309214_r8, 0.26171080165848_r8, 0.0_r8 ], [4,4])
	    real(kind = r8), parameter :: qs(4) = [  2.12709852335625_r8, &
		    &  2.73245878238737_r8, &
		    & 11.22760917474960_r8, &
		    & 13.36199560336697_r8 ]
	    real(kind = r8) :: ks(n,4)
	    real(kind = r8) :: ts(n,4)
	    real(kind = r8) :: ws(n,4)
	    real(kind = r8) :: xs(n,4)
	    real(kind = r8) :: warray(n,4)
	    real(kind = r8) :: ak(n)
	    integer(kind = i8) :: j,i
	    real(kind = r8) :: qoh
	    qoh = q / h
	    xstar = x    
	    do j=1,4
		    warray(:,j) = gaussian(n)*sqrt(qs(j)*qoh)
		    !ts(j) = t + sum(as(:j-1,j)) * h      
		    do concurrent (i=1:n)
			    ak(i)=dot_product(as(:j-1,j), ks(i,:j-1))  
		    enddo        
		    !ak=zero
		    !do i=1,j-1
		    !    ak = ak + as(i,j)*ks(:,i)
		    !enddo
		    xs(:,j) = x + ak  
		    ks(:,j) = h * (  fi_vec(xs(:,j)) + gi_vec(xs(:,j))*warray(:,j)  )
		    xstar = xstar + alphas(j)*ks(:,j)     
	    enddo
    return
    end subroutine rk4_ti_step_vec_mod1

subroutine rk4_ti_step_mod1 ( x, t, h, q, fi, gi, seed, xstar ) 
! mostly suggested by veryreverie from stackoverflow, 
! https://stackoverflow.com/questions/69147944/is-there-room-to-further-optimize-the-stochastic-rk-fortran-90-code?noredirect=1#comment122226757_69147944
    use random  
    implicit none
    real(kind=r8), intent(in) :: x
    real(kind = r8), intent(in) :: t  
    real(kind = r8), intent(in) :: h
    real(kind = r8), intent(in) :: q
    real(kind = r8), external :: fi
    real(kind = r8), external :: gi
    integer(kind = i8), intent(in) :: seed
    real(kind = r8), intent(out) :: xstar
    real ( kind = r8 ), parameter :: alphas(4) = [ 0.47012396888046_r8 &
    ,0.36597075368373_r8 &
    ,0.08906615686702_r8 &
    ,0.07483912056879_r8 ]
    real(kind = r8), parameter :: as(4,4) = reshape([ &
	    &  0.0_r8,              0.0_r8,              0.0_r8,              0.0_r8, &
	    &  2.71644396264860_r8, 0.0_r8,              0.0_r8,              0.0_r8, &
	    & -6.95653259006152_r8, 0.78313689457981_r8, 0.0_r8,              0.0_r8, &
	    &  0.0_r8,              0.48257353309214_r8, 0.26171080165848_r8, 0.0_r8 ], [4,4])
    real(kind = r8), parameter :: qs(4) = [ &
	    &  2.12709852335625_r8, &
	    &  2.73245878238737_r8, &
	    & 11.22760917474960_r8, &
	    & 13.36199560336697_r8 ]
    real(kind = r8) :: ks(4)
    real(kind = r8) :: r8_normal_01
    real(kind = r8) :: ts(4)
    real(kind = r8) :: ws(4)
    real(kind = r8) :: xs(4)
    real(kind = r8) :: normal(4)
    real(kind = r8) :: warray(4)
    integer(kind = i8) :: i
    real(kind = r8) :: qoh
    qoh = q/h
    normal = gaussian(4)  
    xstar = x
    do i=1,4
    !ts(i) = t + sum(as(:i-1,i)) * h
    xs(i) = x + dot_product(as(:i-1,i), ks(:i-1))
    warray(i) = normal(i)*sqrt(qs(i)*qoh)
    ks(i) = h * (fi(xs(i)) + gi(xs(i))*warray(i))
    xstar = xstar + alphas(i)*ks(i)
    enddo
return
end subroutine rk4_ti_step_mod1


subroutine rk4_tv_step ( x, t, h, q, fv, gv, seed, xstar )

!*****************************************************************************80
!
!! RK4_TV_STEP takes one step of a stochastic Runge Kutta scheme.
!
!  Discussion:
!
!    The Runge-Kutta scheme is fourth-order, and suitable for time-varying
!    systems.
!
!    d/dx X(t,xsi) = F ( X(t,xsi), t ) + G ( X(t,xsi), t ) * w(t,xsi)
!
!  Licensing:
!
!    This code is distributed under the GNU LGPL license.
!
!  Modified:
!
!    08 June 2010
!
!  Author:
!
!    John Burkardt
!
!  Reference:
!
!    Jeremy Kasdin,
!    Runge-Kutta algorithm for the numerical integration of
!    stochastic differential equations,
!    Journal of Guidance, Control, and Dynamics,
!    Volume 18, Number 1, January-February 1995, pages 114-120.
!
!    Jeremy Kasdin,
!    Discrete Simulation of Colored Noise and Stochastic Processes
!    and 1/f^a Power Law Noise Generation,
!    Proceedings of the IEEE,
!    Volume 83, Number 5, 1995, pages 802-827.
!
!  Parameters:
!
!    Input, real ( kind = 8 ) X, the value at the current time.
!
!    Input, real ( kind = 8 ) T, the current time.
!
!    Input, real ( kind = 8 ) H, the time step.
!
!    Input, real ( kind = 8 ) Q, the spectral density of the input white noise.
!
!    Input, external real ( kind = 8 ) FV, the name of the deterministic
!    right hand side function.
!
!    Input, external real ( kind = 8 ) GV, the name of the stochastic
!    right hand side function.
!
!    Input/output, integer ( kind = 4 ) SEED, a seed for the random 
!    number generator.
!
!    Output, real ( kind = 8 ) XSTAR, the value at time T+H.
!
  implicit none

  real ( kind = r8 ) a21
  real ( kind = r8 ) a31
  real ( kind = r8 ) a32
  real ( kind = r8 ) a41
  real ( kind = r8 ) a42
  real ( kind = r8 ) a43
  real ( kind = r8 ) a51
  real ( kind = r8 ) a52
  real ( kind = r8 ) a53
  real ( kind = r8 ) a54
  real ( kind = r8 ), external :: fv
  real ( kind = r8 ), external :: gv
  real ( kind = r8 ) h
  real ( kind = r8 ) k1
  real ( kind = r8 ) k2
  real ( kind = r8 ) k3
  real ( kind = r8 ) k4
  real ( kind = r8 ) q
  real ( kind = r8 ) q1
  real ( kind = r8 ) q2
  real ( kind = r8 ) q3
  real ( kind = r8 ) q4
  real ( kind = r8 ) r8_normal_01
  integer ( kind = i8 ) seed
  real ( kind = r8 ) t
  real ( kind = r8 ) t1
  real ( kind = r8 ) t2
  real ( kind = r8 ) t3
  real ( kind = r8 ) t4
  real ( kind = r8 ) w1
  real ( kind = r8 ) w2
  real ( kind = r8 ) w3
  real ( kind = r8 ) w4
  real ( kind = r8 ) x
  real ( kind = r8 ) x1
  real ( kind = r8 ) x2
  real ( kind = r8 ) x3
  real ( kind = r8 ) x4
  real ( kind = r8 ) xstar

  a21 =   0.66667754298442_r8
  a31 =   0.63493935027993_r8
  a32 =   0.00342761715422_r8
  a41 = - 2.32428921184321_r8
  a42 =   2.69723745129487_r8
  a43 =   0.29093673271592_r8
  a51 =   0.25001351164789_r8
  a52 =   0.67428574806272_r8
  a53 = - 0.00831795169360_r8
  a54 =   0.08401868181222_r8

  q1 = 3.99956364361748_r8
  q2 = 1.64524970733585_r8
  q3 = 1.59330355118722_r8
  q4 = 0.26330006501868_r8

  t1 = t
  x1 = x
  !w1 = r8_normal_01 ( seed ) * sqrt ( q1 * q / h )
  k1 = h * fv ( t1, x1 ) + h * gv ( t1, x1 ) * w1 

  t2 = t1 + a21 * h
  x2 = x1 + a21 * k1
  !w2 = r8_normal_01 ( seed ) * sqrt ( q2 * q / h )
  k2 = h * fv ( t2, x2 ) + h * gv ( t2, x2 ) * w2

  t3 = t1 + a31 * h  + a32 * h
  x3 = x1 + a31 * k1 + a32 * k2
  !w3 = r8_normal_01 ( seed ) * sqrt ( q3 * q / h )
  k3 = h * fv ( t3, x3 ) + h * gv ( t3, x3 ) * w3

  t4 = t1 + a41 * h  + a42 * h  + a43 * h
  x4 = x1 + a41 * k1 + a42 * k2 + a43 * k3
  !w4 = r8_normal_01 ( seed ) * sqrt ( q4 * q / h )
  k4 = h * fv ( t4, x4 ) + h * gv ( t4, x4 ) * w4

  xstar = x1 + a51 * k1 + a52 * k2 + a53 * k3 + a54 * k4

  return
end
subroutine timestamp ( )

!*****************************************************************************80
!
!! TIMESTAMP prints the current YMDHMS date as a time stamp.
!
!  Example:
!
!    31 May 2001   9:45:54.872 AM
!
!  Licensing:
!
!    This code is distributed under the GNU LGPL license.
!
!  Modified:
!
!    18 May 2013
!
!  Author:
!
!    John Burkardt
!
!  Parameters:
!
!    None
!
  implicit none

  character ( len = 8 ) ampm
  integer ( kind = 4 ) d
  integer ( kind = 4 ) h
  integer ( kind = 4 ) m
  integer ( kind = 4 ) mm
  character ( len = 9 ), parameter, dimension(12) :: month = (/ &
    'January  ', 'February ', 'March    ', 'April    ', &
    'May      ', 'June     ', 'July     ', 'August   ', &
    'September', 'October  ', 'November ', 'December ' /)
  integer ( kind = 4 ) n
  integer ( kind = 4 ) s
  integer ( kind = 4 ) values(8)
  integer ( kind = 4 ) y

  call date_and_time ( values = values )

  y = values(1)
  m = values(2)
  d = values(3)
  h = values(5)
  n = values(6)
  s = values(7)
  mm = values(8)

  if ( h < 12 ) then
    ampm = 'AM'
  else if ( h == 12 ) then
    if ( n == 0 .and. s == 0 ) then
      ampm = 'Noon'
    else
      ampm = 'PM'
    end if
  else
    h = h - 12
    if ( h < 12 ) then
      ampm = 'PM'
    else if ( h == 12 ) then
      if ( n == 0 .and. s == 0 ) then
	    ampm = 'Midnight'
      else
	    ampm = 'AM'
      end if
    end if
  end if

  write ( *, '(i2.2,1x,a,1x,i4,2x,i2,a1,i2.2,a1,i2.2,a1,i3.3,1x,a)' ) &
    d, trim ( month(m) ), y, h, ':', n, ':', s, '.', mm, trim ( ampm )

  return
end
end module stochastic_rk
module stats
use constants 
implicit none
contains
subroutine stat2(x,xave,xvar)
use constants
real (kind=r8) :: x(:,:),val(5),val2(5),xvar(5),xave(5)
integer (kind =i8) :: i,j,xsize,np
val=zero
val2=zero
np = size(x,2)
do i=1,np
    do j=1,5
	    val(j) = val(j) + x(j,i)
	    val2(j) = val2(j) + x(j,i)**2
    enddo
enddo
xave=val/np
xvar=abs( val2/np - xave**2 ) 
return
end subroutine stat2 

subroutine stat(x,xave,xvar)
use constants
real (kind=r8) :: x(:),val,val2,xvar,xave
integer (kind =i8) :: i,xsize,np
val=0.0
val2=0.0
np = size(x)
do i=1,np
    val = val + x(i)
    val2= val2 + x(i)**2
    !write (6,*) i, x(i)
enddo
xave=val/np
xvar=abs(  val2/np - xave**2  )   
end subroutine stat 

end module stats


    program main
    use random
    use random2
    use constants
    use stochastic_rk
    use stats

    !*****************************************************************************80
    !
    !! MAIN is the main program for STOCHASTIC_RK_TEST.
    !
    !  Discussion:
    !
    !    STOCHASTIC_RK_TEST tests the STOCHASTIC_RK library.
    !
    !  Licensing:
    !
    !    This code is distributed under the GNU LGPL license.
    !
    !  Modified:
    !
    !    20 June 2010
    !
    !  Author:
    !
    !    John Burkardt
    !
    implicit none

    integer(kind=i8) :: i,np,seed,k
    integer(kind=i8) :: day,hour,minute
    real (kind=r8) :: t0,t1,t2
    real (kind=r8), allocatable :: x(:,:)
    real (kind=r8) :: xvar(5),xave(5)

    call timestamp ( )
    write ( *, '(a)' ) ' '
    write ( *, '(a)' ) 'STOCHASTIC_RK_TEST'
    write ( *, '(a)' ) '  FORTRAN90 version'
    write ( *, '(a)' ) '  Test the STOCHASTIC_RK library.'


    seed = 123456789
    call setrn(int(seed,8))

    np = 10000
    allocate(x(5,np))

    call CPU_TIME(t1) 
    do i=1,np
    call test02 ( x(:,i) )   
    !write (6,*) i, x(i) 
    enddo
    call CPU_TIME(t2)  
    write(6,'(''Time cost = '',t20,g12.5,'' sec'')') t2-t1  
    ! statistics
    call stat2(x,xave,xvar)
    write(6,'(''step size : '',t20,g12.5)') '1/1000'
    write(6,'(''Np: '',t20,g12.5)') np
    write(6,'('' '')')
    write(6,'(''Simulated Mean at t=0.2:1.0:0.2 is: '',t50,5(g12.5,1x))') (xave(k),k=1,5)
    write(6,'(''Theoretical Mean at t=0.2:1.0:0.2 is: '',t50,5(g12.5,1x))') ( 20.0*exp(-k*0.2),k=1,5 )
    write(6,'('' '')')
    write(6,'(''Simulated Variance at t=0.2:1.0:0.2 is: '',t50,5(g12.5,1x))') (xvar(k),k=1,5)
    write(6,'(''Theoretical Variance at t=0.2:1.0:0.2 is: '',t50,5(g12.5,1x))') ( 0.5*(1.0-exp(-2*k*0.2)),k=1,5 )

    !
    !  Terminate.
    !
    call timestamp ( )
    stop ('Program end normally.')

    end program main 


    subroutine test02 ( xout )
    use constants
    use stochastic_rk
    implicit none
    interface				   !! Declare function        
    function fi_vec2 ( x ) result (f)
    use constants
    real ( kind = r8 ) :: f(2)
    real ( kind = r8 ) :: x(2)
    end function
    function gi_vec2 ( x ) result (g)
    use constants
    real ( kind = r8 ) :: g(2)
    real ( kind = r8 ) :: x(2)
    end function      
    end interface 
    integer ( kind = i8 ), parameter :: n = 1000
    real ( kind = r8 ) h
    integer ( kind = i8 ) :: i, itot, istart, istep
    real ( kind = r8 ) q
    integer ( kind = i8 ) seed
    real ( kind = r8 ) t
    real ( kind = r8 ), parameter :: t0 = 0.0
    real ( kind = r8 ), parameter :: tn = 1.0
    real ( kind = r8 ) x(2,0:n)
    real (kind = r8) :: xout(5)
    integer ( kind = i8 ) :: nd ! dimension, =2

    h = ( tn - t0 ) / real(n)
    q = 1.0
    seed = 123456789

    i = 0
    t = t0
    x(1,i) = 20.0_r8
    x(2,i) = one
    nd = 2

    !write ( *, '(a)' ) ' '
    !write ( *, '(a)' ) '         I           T             X'
    !write ( *, '(a)' ) ' '
    !write ( *, '(2x,i8,2x,f14.6,2x,g14.6)' ) i, t, x(i)

    do i = 1, n
    t = t0 + (real(i)*(tn-t0))/real(n)
    call rk4_ti_step_vec_mod1 ( x(:,i-1), t, h, q, fi_vec2, gi_vec2, nd, x(:,i) )
    !write ( *, '(2x,i8,2x,f14.6,2x,g14.6)' ) i, t, x(i)
    end do

    itot = i-1
    istart = itot/5
    istep = istart
    xout(1:5:1) = x(1,istart:itot:istep)
    !write ( *, '(2x,i8,2x,f14.6,2x,g14.6)' ) i-1, t, xout
    return
    end subroutine test02   

    function fi_vec2 ( x ) result (f)
    use constants
    implicit none
    real ( kind = r8 ) :: f(2)
    real ( kind = r8 ) :: x(2)
      f(1) = -x(2)*x(1) 
      f(2) = -x(2) + one
    return
    end function fi_vec2
    function gi_vec2 ( x ) result(g)
    use constants
    implicit none
    real ( kind = r8 ), parameter :: gp(2) = [one,zero] 
    real ( kind = r8 ) :: x(2),g(2)
    g=gp
    return
    end function gi_vec2    

    subroutine test01 ( xout )
    use constants
    use stochastic_rk
    implicit none
    integer ( kind = i8 ), parameter :: n = 1000
    real ( kind = r8 ), external :: fi
    real ( kind = r8 ), external :: gi
    real ( kind = r8 ) h
    integer ( kind = i8 ) :: i, itot, istart, istep
    real ( kind = r8 ) q
    integer ( kind = i8 ) seed
    real ( kind = r8 ) t
    real ( kind = r8 ), parameter :: t0 = 0.0
    real ( kind = r8 ), parameter :: tn = 1.0
    real ( kind = r8 ) x(0:n)
    real (kind = r8) :: xout(5)

    h = ( tn - t0 ) / real(n)
    q = 1.0
    seed = 123456789

    i = 0
    t = t0
    x(i) = 20.0

    !write ( *, '(a)' ) ' '
    !write ( *, '(a)' ) '         I           T             X'
    !write ( *, '(a)' ) ' '
    !write ( *, '(2x,i8,2x,f14.6,2x,g14.6)' ) i, t, x(i)

    do i = 1, n
    t = t0 + (real(i)*(tn-t0))/real(n)
    call rk4_ti_step_mod ( x(i-1), t, h, q, fi, gi, seed, x(i) )
    !write ( *, '(2x,i8,2x,f14.6,2x,g14.6)' ) i, t, x(i)
    end do

    itot = i-1
    istart = itot/5
    istep = istart
    xout(1:5:1) = x(istart:itot:istep)
    !write ( *, '(2x,i8,2x,f14.6,2x,g14.6)' ) i-1, t, xout
    return
    end subroutine test01


    function fi ( x )
    use constants

    !*****************************************************************************80
    !
    !! FI is a time invariant deterministic right hand side.
    !
    !  Licensing:
    !
    !    This code is distributed under the GNU LGPL license.
    !
    !  Modified:
    !
    !    20 June 2010
    !
    !  Author:
    !
    !    John Burkardt
    !
    !  Parameters:
    !
    !    Input, real ( kind = 8 ) X, the argument.
    !
    !    Output, real ( kind = 8 ) FI, the value.
    !
      implicit none

      real ( kind = r8 ) fi
      real ( kind = r8 ) x

      fi = -x  !1.0_r8

      return
    end function fi
    function gi ( x )
    use constants

    !*****************************************************************************80
    !
    !! GI is a time invariant stochastic right hand side.
    !
    !  Licensing:
    !
    !    This code is distributed under the GNU LGPL license.
    !
    !  Modified:
    !
    !    20 June 2010
    !
    !  Author:
    !
    !    John Burkardt
    !
    !  Parameters:
    !
    !    Input, real ( kind = 8 ) X, the argument.
    !
    !    Output, real ( kind = 8 ) GI, the value.
    !
      implicit none

      real ( kind = r8 ) gi
      real ( kind = r8 ) x

      gi = 1.0

      return
    end function gi

A similar topic is post on Intel Forum too,
https://community.intel.com/t5/Intel-Fortran-Compiler/Why-heap-array-make-my-code-6-times-slower/m-p/1315276#M157550

But my intension is not to duplicate the topic.
On intel’s forum I just try to get some intel complier specific answers/suggestions, because after all, they made the compiler.

This slow down seems to me like, for this small toy code, when not heap arrays, the data mostly really stored in the CPU cache. However if heap arrays, the data really all distributed on the RAM. It is likely the performance difference between CPU cache and RAM.