I think this is a different issue from the previous thread, and exploring this a little might be instructive. The previous issue was how reciprocals were approximated when vector hardware was used compared to how divisions were done in the cpu hardware itself. In this case there is a separate issue that arises in this process.
I don’t know exactly how reciprocals are computed in the intel vector hardware, but it is almost certainly some variation of a newton iteration. A newton iteration for a reciprocal is the same as locating a root of the equation
f(r) = d - 1/r
where d is the denominator and r is the desired reciprocal. The newton update is
r_new = r_old - f(r_old) / f'(r_old)
where f'(r_old)
is the derivative evaluated at r_old
. If you compute the derivative and do the algebra, you end up with
r_new = r_old * (2 - d * r_old)
Notice that this expression involves only one addition and two multiplications each step, and those operations are easier to implement than the original division expression by hand with pencil and paper, in mechanical computers, and in electronic computers, so that is why this algorithm is popular.
You can now see how this works. If the current r_old is too small, then d*r_old
is less than one, (2-d*r_old)
is greater than one, and r_new
will be larger than r_old
. Conversely, if r_old
is too large, then the product is greater than 1, (2-d*r_old)
is less than one, and r_new
will be smaller than r_old
.
Here is a program that uses this algorithm to approximate a reciprocal, followed by the multiplication to produce the final division result.
implicit none
integer :: i
real :: num, denom, r
write(*,*) 'tiny=', tiny(r), ' huge=', huge(r), ' 1/huge=', 1.0/huge(r)
write(*,*) 'input: num, denom, r'
read(*,*) num, denom, r
do i = 1, 5
r = r * (2.0 - r * denom)
write(*,*) 'r=', r
enddo
write(*,*) 'division:', num/denom, ' multiplication:', num * r
end
Here is what happens with a “normal” set of inputs:
gfortran fp.f90 && a.out
tiny= 1.17549435E-38 huge= 3.40282347E+38 1/huge= 2.93873588E-39
input: num, denom, r
3 5 .25
r= 0.187500000
r= 0.199218750
r= 0.199996948
r= 0.200000003
r= 0.200000003
division: 0.600000024 multiplication: 0.600000024
So you can see how the initial guess for r
was too small, and the newton steps increase that guess at each step of the iteration. If you play around with the inputs a little, you can see that if the initial inputs are too large, then the sequences converges from above rather than from below. The other feature is that the number of correct digits in the reciprocal doubles each iteration. So not only are the newton steps self-correcting, they self-correct really really well. With a good initial guess, you might only need one or two steps. However, with a poor initial guess for r
, the sequence might not converge in five steps, or it might immediately diverge. A good implementation of the algorithm must test for these cases and adapt appropriately.
Note also that the extra info that is printed shows that 1/huge is smaller than tiny. That will be an important point below.
When a computer uses this algorithm to compute reciprocals, it looks at the floating point exponent and simply shifts the exponent value to get an initial guess that is within a factor of two of the correct result. That operation involves only extracting the exponent field (which is biased) and shifting down for large values or up for small values, which requires just integer arithmetic, so it is fast and cheap. Some computers might also look at the leading bits in the mantissa to get a little closer initial guess. If they start with the initial 4 bits, then a 16-element table lookup would give 4 correct mantissa bits in the initial reciprocal approximation. The first iteration would give 8 correct bits, the next would give 16 bits, and the next would give 32 bits. There are only 23 mantissa bits in a 32-bit real value, so only three iterations would ever be required. Some cpus have that lookup table stored in read-only registers, so it is always available without any runtime initializations or anything. A larger table might be used in order to reduce the number of iterations.
Ok, so what does this algorithm do with really large numbers? Here is one example where the initial guess to r
should be pretty good, and where the value is still a normal fp number.
$ ifort fp.f90 && a.out
tiny= 1.1754944E-38 huge= 3.4028235E+38 1/huge= 0.0000000E+00
input: num, denom, r
1.e37 1.e38 9e-37
r= -7.9199988E-35
r= -6.2742218E-31
r= -3.9365857E-23
r= -1.5496707E-07
r= -2.4014791E+24
division: 0.1000000 multiplication: -Infinity
The “pretty good” initial guess isn’t good enough, and it diverges. Here is another case where the initial guess is a denormal number.
$ifort fp.f90 && a.out
tiny= 1.1754944E-38 huge= 3.4028235E+38 1/huge= 0.0000000E+00
input: num, denom, r
1e37 1e38 1e-38
r= 0.0000000E+00
r= 0.0000000E+00
r= 0.0000000E+00
r= 0.0000000E+00
r= 0.0000000E+00
division: 0.1000000 multiplication: 0.0000000E+00
Notice here that the input r
is actually the correct value, but since it is a denormal number, it gets set to 0.0 by default by ifort. In this case, the denominator is a normal number, but 1/denom is a denormal number. If I do the same thing with gfortran,
$ gfortran fp.f90 && a.out
tiny= 1.17549435E-38 huge= 3.40282347E+38 1/huge= 2.93873588E-39
input: num, denom, r
1e37 1e38 1e-38
r= 1.00000008E-38
r= 1.00000008E-38
r= 1.00000008E-38
r= 1.00000008E-38
r= 1.00000008E-38
division: 0.100000001 multiplication: 0.100000009
So the new thing here is what happens with denormal intermediate results. Are they set to zero, or are they used in (relatively expensive) floating point operations with those denormal values. As seen above, the computed result can range from zero, to Inf, and can even be the correct result.