Tips to make this toy program faster?

billlong · March 11, 2021, 4:32am

For the following loop:

 ll = 0.0
  do i=1,n
    z = (x(i) - mu)/sigma
    ll = ll - 0.5*z**2 - log(sigma*sqrt(2*pi))
  end do

the scalar temp z would almost certainly be eliminated by the compiler, so there is no harm in writing it this way. The loop-invariant scalar log(sigma(sqrt(2*pi) will most likely be computed only once before the loop starts and the resulting internal temp used in the loop. My only real objection is the variable name ll - I tend to avoid lowercase L (or an uppercase O) for variables since they look too much like numbers in some fonts. There is no ambiguity in this case - just a style issue.

billlong · March 11, 2021, 5:18am

I tried the original code on an Intel Broadwell processor (one core) with this main program:

program test
  implicit none
  integer,parameter :: n = 1000000
  double precision :: x(n),mu,sigma,nll
  integer(8) :: t1,t2
  real(8)    :: rate, time

  call random_number(x)
  x = 12.0d0*x - 6.0d0
  mu = sum(x)/n
  sigma = sqrt(sum(x-mu)**2/n)

  call system_clock(t1, count_rate = rate)
  call negloglik(x,n,mu,sigma,nll)
  call system_clock(t2)
  time = (t2-t1)/rate
  print *, "Time = ",time," sec."
  print *, "mu    = ",mu
  print *, "sigma = ",sigma
  print *, "nll   = ",nll
end program test

and got the output

 Time =  5.90489042675893883E-4  sec.
 mu    =  1.23783794657927361E-3
 sigma =  7.89020759839331797E-12
 nll   =  9.64039973146251418E+28

Changing the subroutine to

subroutine negloglik(x,n,mu,sigma,nll)
  implicit none
  integer,intent(in) :: n
  double precision, intent(in) :: x(n), mu, sigma
  double precision, intent(out) :: nll

  nll = sum(0.5d0*((x-mu)/sigma)**2) + n*log(sigma*sqrt(2*acos(-1.0d0)))

end subroutine negloglik

and the same main program results in a smaller time:

 Time =  2.69306805074971179E-4  sec.
 mu    =  1.23783794657927361E-3
 sigma =  7.89020759839331797E-12
 nll   =  9.6403997314624702E+28

(default compiler options, Cray/HPE compiler ftn)

[Apparently the text system here mutilates input, for example to eliminate the “*” between sigma and sqrt. (and to remove the indentation). ]

Edited: Format code blocks for readability. LK

vmagnin · March 11, 2021, 8:09am

Happily, both seem the same in gfortran:

program pi
    use iso_fortran_env, only: wp=>real128
    implicit none
    real(wp), parameter :: pi1 = 4.0_wp * atan(1.0_wp)
    real(wp), parameter :: pi2 = acos(-1.0_wp)

    print *, 1.0_wp/(pi1-pi2), 4.0_wp * atan(1.0_wp)-acos(-1.0_wp)
end program pi

$ gfortran -Wall -Wextra -std=f2018 -pedantic pi.f90 && ./a.out
pi.f90:11:19:

   11 |     print *, 1.0_wp/(pi1-pi2), 4.0_wp * atan(1.0_wp)-acos(-1.0_wp)
      |                   1
Error: Division by zero at (1)

It brings an interesting question: why \pi is not defined in the Fortran standard? I guess there is a profound (transcendental?) numerical reason if a scientific language refuses to define \pi for 65 years… The situation is the same in C language: some standard libraries define a MATH_PI constant in math.h, but it is not in the standard.

lkedward · March 11, 2021, 8:30am

Hi @billlong, the Discourse uses markdown formatting which is what has caused your asterisk to disappear. Of most use is the code block, denoted by triple backticks ( ` ) which will print your literal text as monospace, which aids in readability. I have updated your post to use code blocks.

Beliavsky · March 11, 2021, 12:07pm

A module defining pi and some physical constants, used in a large program so hopefully correct, is here.

gardhor · March 11, 2021, 3:23pm

About Physical constants, It is not so easy. They are not so constant! See NIST web page:
NIST constants.

vmagnin · March 11, 2021, 3:30pm

We could imagine a script which would load the latest values:
https://physics.nist.gov/cuu/Constants/Table/allascii.txt
and transform automatically that .txt into a Fortran module.

gardhor · March 11, 2021, 3:41pm

Yes you are right.
Also, we need to have the possibility to recover the previous version.

vmagnin · March 11, 2021, 4:08pm

Yes, previous versions are still online, for example:
https://physics.nist.gov/cuu/Constants/Table/allascii_2014.txt

The difficulty would be to validate the result. A solution:

Generate the Fortran module using a Python or Perl script, where all would be stored (names, values, uncertainties, units),
Write a Fortran program which generates another allascii.txt file,
Make a diff to verify they are identical.

One problem is that some exact values, computed from another exact value, are given with a limited number of digits:
Boltzmann constant in eV/K 8.617 333 262... e-5
Those ... are annoying…
And probably, managing the number of digits to print for each constant. Not so easy.

gardhor · March 11, 2021, 5:19pm

I know about the versions, I’ve got 3 in my code.
About the number of digits:
While reading the constants, one can read also the number of digits + the exponent
Then, it will be more easy to write the constants in exactly the same format.

About, the Boltzman constant, R.
molar gas constant 8.314 462 618… (exact) J mol^-1 K
One can calculate its value, instead of using the reading one. It is the reason, they give R with “…”

vmagnin · March 11, 2021, 5:25pm

So simple…

gardhor · March 11, 2021, 5:31pm

Then, the units (with metric prefix) have to be read and probably converted.

themos · March 11, 2021, 7:19pm

I would put the timing code around the function, otherwise the differences can get swamped by the time to create the random numbers (which you want to compute only once, anyway).

themos · March 11, 2021, 7:28pm

I think you need

vmagnin · March 11, 2021, 7:58pm

Yes, I struggled to try to parallelize the loop with OpenMP, until I realize my parallelization was just working: on a slow PC, I saw the effect. On my fast i7, I thought I was unable to use correctly OpenMP, although I already use it in a model.
I guess the main program is eating an important part of the CPU.

vmagnin · March 12, 2021, 1:09pm

@gardhor
I now have a short dirty Python script doing the basic part of the job, just downloading the latest values, extracting the values of the 354 constants and generating a simple module:

module NIST_physical_constants
  use, intrinsic :: iso_fortran_env, only: wp=>real64
  implicit none

  real(wp), parameter :: alpha_particle_electron_mass_ratio = 7294.29954142_wp   !
  real(wp), parameter :: alpha_particle_mass = 6.6446573357e-27_wp   !kg
...
  real(wp), parameter :: Wien_wavelength_displacement_law_constant = 2.897771955e-3_wp   !m K
  real(wp), parameter :: W_to_Z_mass_ratio = 0.88153_wp   !
end module NIST_physical_constants

A little test program:

$ ./build/nist.out
 Do you know this constant with so many zeros?
0100000110110001110111100111100001001010000000000000000000000000
 Oh sorry, you're that kind of biped with 10 digits?
   299792458.00000000     
 Some other constants:
   6.0221407599999999E+023
   1.3806490000000001E-023
   1.6021766339999999E-019

I will improve it before pushing on GitHub and posting a new post in the Discourse for further collaboration.

awvwgk · March 12, 2021, 1:16pm

@vmagnin Have a look at QCElemental/context.py at master · MolSSI/QCElemental · GitHub for accurately parsing different versions of the NIST codata files.

vmagnin · March 12, 2021, 1:41pm

Much less fun if everything has already been done…
They extract data here:
https://github.com/MolSSI/QCElemental/blob/master/devtools/scripts/build_physical_constants_2018.py
using the pandas library: Getting started — pandas 1.2.3 documentation
They generate a JSON file.
They manipulate the data here: QCElemental/context.py at master · MolSSI/QCElemental · GitHub
and generate a C header file.

awvwgk · March 12, 2021, 1:56pm

Maybe we can contribute a Fortran export to qcel?

alozada · March 12, 2021, 3:21pm

Nice.

I find it curious that the periodic table data are obtained from NIST and not directly from IUPAC data bank

Topic		Replies	Views
Code slower than C++ version Help	25	1251	October 14, 2021
A modernized SPLPAK Announcements	10	1023	June 5, 2025
Fast Math library for modern fortran Announcements	4	1003	November 11, 2022
Fortran Best Practice Minibook Announcements	84	4487	September 20, 2021
Optimizing vectorized array operations Help	79	1503	May 5, 2025

Tips to make this toy program faster?

Related topics