There is a new project
implementing the algorithm in the paper Accurate Sum and Dot Product by Ogita et al., (2005).
Testing the eftdot module with the main program below, the error free transformation does enable a more accurate computation of the dot product in single precision, at the cost of higher CPU time compared to computing the dot product in single or double precision using the intrinsic function. I don’t know if this is of practical importance.
use eftdot, only: sp, dp, dot2_s
implicit none
integer, parameter :: n = 10**7, niter=10, ntimes=4
integer :: iter
real(kind=sp) :: x(n), y(n), result, xy_sp, xy_dp, t(ntimes), dt(ntimes-1)
print*,"n, sp, dp =", n, sp, dp
print "(/,*(a12))", "dot_product","err_exact","err_sp"
dt = 0.0
do iter=1,niter
call random_number(x)
call random_number(y)
x = x - 0.5
y = y - 0.5
call cpu_time(t(1))
call dot2_s(x, y, n, result) ! single precision emulating double
call cpu_time(t(2))
xy_sp = dot_product(x,y) ! single precision
call cpu_time(t(3))
xy_dp = dot_product(real(x, kind=dp),real(y, kind=dp)) ! double precision
call cpu_time(t(4))
print "(*(f12.6))",xy_dp, result-xy_dp, xy_sp - xy_dp
dt = dt + t(2:ntimes) - t(:ntimes-1)
end do
print "(/,'times',/,*(a12))", "exact_sp", "sp", "dp"
print "(*(f12.6))",dt
end
Sample output is below, compiling with gfortran -O3 -march=native -flto
n, sp, dp = 10000000 4 8
dot_product err_exact err_sp
-202.377380 0.000000 -0.007553
26.850029 0.000006 0.007391
-44.441853 0.000004 0.009468
-591.380188 0.000000 0.015259
-235.354996 -0.000015 0.024811
50.949928 -0.000004 -0.006489
-185.757919 0.000000 -0.004684
-78.702240 -0.000008 0.000664
45.996658 -0.000008 -0.006851
366.751495 0.000000 0.013306
times
exact_sp sp dp
0.203125 0.125000 0.062500