Indeed, there is no guarantee. But in practice it seems to work fine.
Apart from higher precision, one can also use a different summation algorithm. E.g. Python offers both math.sum
(naive) and math.fsum
which tracks intermediate partial sums. In Julia one can use KahanSummation.jl.
In Fortran, using Kahan summation requires some care with respect to the compiler flags. I’ve tried to roll my own implementation (thanks to an anonymous reader for the pseudo-code):
!----------------
! kahan_sum.F90
!----------------
module kahan_sum
use, intrinsic :: iso_fortran_env, only: int64
implicit none
private
public :: sum
integer, parameter :: sp = kind(1.0e0)
integer, parameter :: ip = int64
contains
!> Kahan sum algorithm
!> https://en.wikipedia.org/wiki/Kahan_summation_algorithm#The_algorithm
#ifdef __GFORTRAN__
pure function sum(a) result(accum)
real(sp), intent(in) :: a(:)
#else
impure function sum(a) result(y)
real(sp), intent(in), target :: a(..)
real(sp):: y
real(sp), pointer :: aflat(:) => null()
integer(ip) :: nelem, dims(rank(a))
dims = shape(a,kind=ip)
nelem = product(dims)
select rank(a)
rank(0)
y = a
rank(1)
aflat(1:nelem) => a(1:nelem)
y = sum_kernel(aflat)
rank(2)
aflat(1:nelem) => a(1:dims(1),1:dims(2))
y = sum_kernel(aflat)
rank(3)
aflat(1:nelem) => a(1:dims(1),1:dims(2),1:dims(3))
y = sum_kernel(aflat)
rank default
error stop '[sum] Ranks > 3 are not supported.'
end select
contains
#endif
#ifndef __GFORTRAN__
pure function sum_kernel(a) result(accum)
real(sp), intent(in) :: a(:)
#endif
real(sp) :: corr, accum, t, y
integer(ip) :: i
corr = 0.0_sp
accum = 0.0_sp
do i = 1, size(a,kind=ip)
y = a(i) - corr
t = accum + y
corr = (t - accum) - y
accum = t
end do
#ifndef __GFORTRAN__
end function
#endif
end function
end module
!----------------
! kahan_main.f90
!----------------
program kahan_main
use kahan_sum, only: ksum => sum
implicit none
integer :: nmax = 1000000000
real, allocatable :: t(:)
allocate(t(nmax))
call random_number(t)
print *, sum(t), sum(real(t,kind(1.0d0))), ksum(t)
deallocate(t)
end program
I was only able to test the assumed-rank version with Intel compilers, but it’s not difficult to call the sum kernel directly in gfortran. Here’s the output for different compiler flags:
$ ifort -warn all -O0 kahan_sum.F90 kahan_main.f90
$ ./a.out
1.6777216E+07 499994194.168867 4.9999421E+08
$ ifort -warn all -O3 kahan_sum.F90 kahan_main.f90
$ ./a.out
1.3421773E+08 499994194.168732 4.9999421E+08
$ ifx -warn all -O0 kahan_sum.F90 kahan_main.f90
$ ./a.out
1.6777216E+07 499994194.168867 4.9999421E+08
$ ifx -warn all -O3 kahan_sum.F90 kahan_main.f90
$ ./a.out
1.6777216E+07 499994194.169164 1.6777216E+07
$ gfortran -Wall -Wno-intrinsic-shadow -Os kahan_sum.F90 kahan_main.f90
$ ./a.out
16777216.0 500007079.09539169 500007072.
$ gfortran -Wall -Wno-intrinsic-shadow -O2 kahan_sum.F90 kahan_main.f90
$ ./a.out
16777216.0 500004235.12119830 500004224.
$ gfortran -Wall -Wno-intrinsic-shadow -Ofast kahan_sum.F90 kahan_main.f90
$ ./a.out
67108864.0 500003566.39887440 67108864.0
Here you can see how an aggresive option such as -Ofast
“ruins” the algorithm.