I recently watched a nice video explaining the Quake III fast inverse square root algorithm. Since I don’t do much bit fiddlery in my real work, I had some fun spinning up a Fortran translation:
module quick_recip_sqrt
use, intrinsic :: iso_fortran_env, only: real32, int32
implicit none
contains
pure elemental function qrsqrt(x) result(y)
real(real32), intent(in) :: x
real(real32) :: y
integer(int32) :: i
real(real32) :: x2
real(real32), parameter :: threehalfs = 1.5_real32
x2 = x*0.5_real32
y = x
i = transfer(y, i)
i = int(z'5f3759df', int32) - ishft(i, -1)
y = transfer(i, y)
y = y * (threehalfs - x2*y*y)
end function
end module
But is it actually fast?
use, intrinsic :: iso_fortran_env, only: real32
use quick_recip_sqrt, only: qrsqrt
implicit none
integer, parameter :: n = 100000, k = 100
integer :: i
real(real32), allocatable :: x(:), y(:)
real(real32) :: tick, tock, t
allocate (x(n))
allocate (y, mold=x)
print "(a,i0,a,i0,a)", "Benchmarking elapsed time in seconds for ", n, " evaluations, averaged over ", k, " repetitions"
t = 0.
do i = 1, k
call random_number(x)
y = 0.
call cpu_time(tick)
y = 1/sqrt(x)
call cpu_time(tock)
t = t + tock-tick
end do
t = t/k
print *, "1/sqrt: ", t
t = 0.
do i = 1, k
call random_number(x)
y = 0.
call cpu_time(tick)
y = qrsqrt(x)
call cpu_time(tock)
t = t + tock-tick
end do
t = t/k
print *, "qrsqrt: ", t
end
Here’s what I find:
(base) [nshaffer:~/tmp/sqrt]$ gfortran --version
GNU Fortran (Homebrew GCC 10.2.0) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(base) [nshaffer:~/tmp/sqrt]$ for i in {0..3}
> do
> gfortran -std=f2018 -O$i quick_recip_sqrt.f90 test.f90 && ./a.out
> done
Benchmarking elapsed time in seconds for 100000 evaluations, averaged over 100 repetitions
1/sqrt: 1.48980020E-04
qrsqrt: 2.23302911E-03
Benchmarking elapsed time in seconds for 100000 evaluations, averaged over 100 repetitions
1/sqrt: 1.45219863E-04
qrsqrt: 1.42900055E-04
Benchmarking elapsed time in seconds for 100000 evaluations, averaged over 100 repetitions
1/sqrt: 4.25589446E-04
qrsqrt: 7.65660079E-04
Benchmarking elapsed time in seconds for 100000 evaluations, averaged over 100 repetitions
1/sqrt: 1.09429995E-04
qrsqrt: 7.45430880E-04
Nope! If you dig into the disassembly, it turns out that gfortran on my machine makes use of the sqrtss
SSE instruction for square-root. Evidently the additional division instruction to evaluate 1/sqrt(x)
still blows the hand-spun (and less accurate!) version out of the water at most optimization levels.
Remember, kids: just because it was fast in the 90’s doesn’t mean it still is! That said, it’s still a fun bit of code to pick apart and understand how it works.