Dot product in single precision emulating double precision

There is a new project

implementing the algorithm in the paper Accurate Sum and Dot Product by Ogita et al., (2005).

Testing the eftdot module with the main program below, the error free transformation does enable a more accurate computation of the dot product in single precision, at the cost of higher CPU time compared to computing the dot product in single or double precision using the intrinsic function. I don’t know if this is of practical importance.

use eftdot, only: sp, dp, dot2_s
implicit none
integer, parameter :: n = 10**7, niter=10, ntimes=4
integer :: iter
real(kind=sp) :: x(n), y(n), result, xy_sp, xy_dp, t(ntimes), dt(ntimes-1)
print*,"n, sp, dp =", n, sp, dp
print "(/,*(a12))", "dot_product","err_exact","err_sp"
dt = 0.0
do iter=1,niter
   call random_number(x)
   call random_number(y)
   x = x - 0.5
   y = y - 0.5
   call cpu_time(t(1))
   call dot2_s(x, y, n, result) ! single precision emulating double
   call cpu_time(t(2))
   xy_sp = dot_product(x,y) ! single precision
   call cpu_time(t(3))
   xy_dp = dot_product(real(x, kind=dp),real(y, kind=dp)) ! double precision
   call cpu_time(t(4))
   print "(*(f12.6))",xy_dp, result-xy_dp, xy_sp - xy_dp
   dt = dt + t(2:ntimes) - t(:ntimes-1)
end do
print "(/,'times',/,*(a12))", "exact_sp", "sp", "dp"
print "(*(f12.6))",dt
end

Sample output is below, compiling with gfortran -O3 -march=native -flto

 n, sp, dp =    10000000           4           8

 dot_product   err_exact      err_sp
 -202.377380    0.000000   -0.007553
   26.850029    0.000006    0.007391
  -44.441853    0.000004    0.009468
 -591.380188    0.000000    0.015259
 -235.354996   -0.000015    0.024811
   50.949928   -0.000004   -0.006489
 -185.757919    0.000000   -0.004684
  -78.702240   -0.000008    0.000664
   45.996658   -0.000008   -0.006851
  366.751495    0.000000    0.013306

times
    exact_sp          sp          dp
    0.203125    0.125000    0.062500
1 Like

I compared it with the implementations in fast_math (fprod and fprod_kahan) on Compiler Explorer (you’ll see n=1e6, otherwise there was a Segfault because the memory is limited under CE)

gfortran 14.2 -O3 -march=native

n, sp, dp =     1000000           4           8

 dot_product   err_exact      err_sp   err_fprod    err_fsvd     err_fpk   err_fksvd
  -18.873257   -0.000002    0.000046   -0.000002   -0.000179   -0.000002   -0.000004
   -3.847342   -0.000000    0.000829   -0.000000    0.000063   -0.000000   -0.000001
  -43.166283    0.000000    0.000137    0.000000    0.000217    0.000000    0.000011
  -13.259282   -0.000002   -0.001365   -0.000002   -0.000026   -0.000002   -0.000001
  -95.034035    0.000000   -0.002991    0.000000    0.000015    0.000000   -0.000015
   98.777199    0.000000   -0.003441    0.000000   -0.000145    0.000000    0.000023
  -37.271225    0.000004    0.000050    0.000004    0.000118    0.000004    0.000004
  -11.045083   -0.000002    0.000486   -0.000002    0.000035   -0.000002   -0.000014
  -45.170433   -0.000008    0.000225   -0.000008    0.000347   -0.000008   -0.000008
    5.349956    0.000002   -0.000184    0.000002   -0.000067    0.000002   -0.000001

times
    exact_sp          sp          dp    fprod sp    fprod dp   fprodk sp   fprodk dp
    0.054966    0.012406    0.013277    0.005627    0.014181    0.006836    0.017551

More exhaustive testing would be needed, but already from here I see that the precision is not necessarily better and timing-wise there is a x10.

3 Likes