I tried the original code on an Intel Broadwell processor (one core) with this main program:
program test
implicit none
integer,parameter :: n = 1000000
double precision :: x(n),mu,sigma,nll
integer(8) :: t1,t2
real(8) :: rate, time
call random_number(x)
x = 12.0d0*x - 6.0d0
mu = sum(x)/n
sigma = sqrt(sum(x-mu)**2/n)
call system_clock(t1, count_rate = rate)
call negloglik(x,n,mu,sigma,nll)
call system_clock(t2)
time = (t2-t1)/rate
print *, "Time = ",time," sec."
print *, "mu = ",mu
print *, "sigma = ",sigma
print *, "nll = ",nll
end program test
and got the output
Time = 5.90489042675893883E-4 sec.
mu = 1.23783794657927361E-3
sigma = 7.89020759839331797E-12
nll = 9.64039973146251418E+28
Changing the subroutine to
subroutine negloglik(x,n,mu,sigma,nll)
implicit none
integer,intent(in) :: n
double precision, intent(in) :: x(n), mu, sigma
double precision, intent(out) :: nll
nll = sum(0.5d0*((x-mu)/sigma)**2) + n*log(sigma*sqrt(2*acos(-1.0d0)))
end subroutine negloglik
and the same main program results in a smaller time:
Time = 2.69306805074971179E-4 sec.
mu = 1.23783794657927361E-3
sigma = 7.89020759839331797E-12
nll = 9.6403997314624702E+28
(default compiler options, Cray/HPE compiler ftn)
[Apparently the text system here mutilates input, for example to eliminate the “*” between sigma and sqrt. (and to remove the indentation). ]
Edited: Format code blocks for readability. LK