Some compilers print Inf or Infinity with this program, others won’t compile it, saying there is a floating overflow. Which is right?
program nearesthuge
print *,nearest(huge(1.0), 1.0)
end program nearesthuge
Some compilers print Inf or Infinity with this program, others won’t compile it, saying there is a floating overflow. Which is right?
program nearesthuge
print *,nearest(huge(1.0), 1.0)
end program nearesthuge
Without digging in the standard, I would say this is illegal code, thus the compiler can do whatever he wants.
In general, I’d agree with @PierU that it is implementation-dependent. huge() returns the largest model value, whereas nearest() returns a machine-representable value next larger than the argument. Even though the real model is not the same as the typical machine model (we’re adding a note to F202Y to say so), in most cases there will be no such representable value.
This means that it is up to the programmer to avoid that argument pair to the nearest()
function, right?
It seems like the situation is a little different for the other extreme, nearest(tiny(1.0),-1.0)
. Most machines have denormal numbers, so there are machine-representable numbers smaller than tiny(1.0)
, and a programmer might intentionally want some of them, while knowing that he is not guaranteed to generate them that way.
@RonShepard is right: the other extreme is different. This program to test it compiled and ran with gfortran, g95, ifort, ifx and AMD flang, with various outputs. Lfortran gave an internal compiler error and said
'Nearest` intrinsic is not yet implemented for runtime values
The program:
program nearestsmall
implicit none
real:: x
integer i
x = tiny(x)
do i = 1,digits(x)+1
if(x==0) exit
x = nearest(x,-x)
print "(A,I3,E14.7)",'In loop i,x =',i,x
x = x/2.0
end do
print "(A,2E14.7)",'After loop x, tiny*epsilon =',x,tiny(x)*epsilon(x)
end program nearestsmall
I think printing the values with a B32.32 format makes these kinds of edge conditions so informative. Seeing the bit pattern of huge(1) and understanding the integer nature of the exponent and how the fraction is normalized into a decimal value makes it all make sense accept how exceptional numbers like Inf/-Inf, the different Nan values and negative zero usually are implemented perhaps. It can lead to a better understanding of why overflow often results in a negative number, why so many 32-bit functions like COS and SIN are so smooth with smaller values but can look like random number generates when used with large values, why adding one to a big number often produces no change in the value and so on.
On most machines nowadays when you see the bit pattern of huge(0) and then add one to the fraction as a binary number you can see it would change the sign bit but otherwise start you over, so you might expect the value to now be a negative zero. But then on most machines you find you get Infinity instead, which is educational. But increment the bits (in a modulo fashion as you just discovered) and so then (if your platform supports it) you find out how negative zero IS represented and so on.
In the implementations I have seen it is pretty clear that NEAREST is just adding or subtracting 1 from the binary fraction bit pattern t of the float value.in base2 But working near the edge of the range like with HUGE() usually makes you want to switch everything to 64-bit :>.
These kinds of examples can be great tools for explaining floating-point math versus conventional math; but I find it easier to discuss and understand when looking at the bit patterns. That is where ideas like representable numbers versus arbitrary constants become suddenly clear and the proper fear of floating point math can be instilled in those only previously exposed to conventional math or arbitrary precision computation can being
So then take a small float and start dividing it by two. When you realize the result stops changing you look at the bit pattern and suddenly EPSILON makes sense.
Then you start thinking how maybe Planck’s constant is so similar to that and you remember those guys talkng about how the world is really a simulation and you start thinking thinking – wow! quantum behavior really does sort of look like what happens when you use floating point math on a computer …
This works almost identically on gfortran and nagfor on arm64 hardware and MacOS. They both step down through the range of denormal numbers and finally end up at 0.0, which is both a model number and representable. The only difference was that nagfor reported an underflow condition at the end of the program.
program demo_epsilon
use,intrinsic :: iso_fortran_env, only : dp=>real64,sp=>real32
implicit none
real(kind=dp) :: t
real(kind=dp) :: my_dp_eps
! starting with a value of 1, keep dividing the value
! by 2 until no change is detected. Note that with
! infinite precision this would be an infinite loop.
! 1/2 of a non-zero number is always smaller than the number, right?
my_dp_eps = 1.0d0
INFINITE: do
my_dp_eps = my_dp_eps/2.0d0
t = 1.0d0 + my_dp_eps
if (t <= 1.0d0) exit
enddo INFINITE
my_dp_eps = 2.0d0*my_dp_eps
print *, my_dp_eps
print *, 'weird coincidence, that is the same as ',epsilon(0.0d0)
print *, 'congratulations! You calculated the epsilon value of a machine the hard way!'
print '(b64.64)',my_dp_eps
end program demo_epsilon
So I think understanding this one is fun too!
Such loops were a common sight in the days before the epsilon(...)
and other query intrinsics were introduced. I have one in a code I’m refactoring:
C
C CALCULATE ROUGH ESTIMATE OF UNIT ROUND-OFF ERROR FOR CHECKING
C
TWOU = 0.1D0
40 TEMP = 1.0D0 + TWOU
IF (1.0D0.EQ.TEMP) THEN
TWOU = TWOU*2.0D0
ELSE
TWOU = TWOU*0.5D0
GO TO 40
END IF
I noticed a report about this among some old Bell Labs material (Selected Bell Labs Tech Reports, some pretty famous names in there ):
- N. L. Schryer, CSTR #89, A Test of Computer’s Floating-Point Arithmetic Unit, Bell Labs, February 1981. Also in Sources and Development of Mathematical Software, ed. W. Cowell, Prentice-Hall.
The link is broken unfortunately. I was going through the works of N. L. Schryer in search of something else. He wrote a very nice piece about software for solving PDEs in one dimension (his tone of writing is very clean and comprehensible, with a pinch of humor).
To me, nearest( tiny(1.0), -1.0 )
shalll return 0.0
, otherwise it’s not standard conforming.
Using B32.32
(thanks @urbanjost) I changed the progran to this:
program nearestsmall
implicit none
real x
integer i
x = tiny(x)
do i = 1,digits(x)+1
if(x==0) exit
x = nearest(x,-x)
print "(2X,A,I3,E14.6,A,B32.32)",'In loop i,x =',i,x,' = ',x
x = x/2.0
end do
print "(12X,A,B32.32)",'tiny(x)*epsilon(x) = ',tiny(x)*epsilon(x)
end program nearestsmall
The Intel compilers were interesting: ifort 2021.13.1 20240703 gave
In loop i,x = 1 0.117549E-37 = 00000000011111111111111111111111
tiny(x)*epsilon(x) = 00000000000000000000000000000001
and ifx 2025.0.0 20241008 gave
In loop i,x = 1 0.117549E-37 = 00000000011111111111111111111111
In loop i,x = 2 0.587747E-38 = 00000000001111111111111111111111
In loop i,x = 3 0.293873E-38 = 00000000000111111111111111111111
In loop i,x = 4 0.146937E-38 = 00000000000011111111111111111111
In loop i,x = 5 0.734683E-39 = 00000000000001111111111111111111
In loop i,x = 6 0.367341E-39 = 00000000000000111111111111111111
In loop i,x = 7 0.183670E-39 = 00000000000000011111111111111111
In loop i,x = 8 0.918341E-40 = 00000000000000001111111111111111
In loop i,x = 9 0.459163E-40 = 00000000000000000111111111111111
In loop i,x = 10 0.229575E-40 = 00000000000000000011111111111111
In loop i,x = 11 0.114780E-40 = 00000000000000000001111111111111
In loop i,x = 12 0.573832E-41 = 00000000000000000000111111111111
In loop i,x = 13 0.286846E-41 = 00000000000000000000011111111111
In loop i,x = 14 0.143353E-41 = 00000000000000000000001111111111
In loop i,x = 15 0.716064E-42 = 00000000000000000000000111111111
In loop i,x = 16 0.357331E-42 = 00000000000000000000000011111111
In loop i,x = 17 0.177965E-42 = 00000000000000000000000001111111
In loop i,x = 18 0.882818E-43 = 00000000000000000000000000111111
In loop i,x = 19 0.434403E-43 = 00000000000000000000000000011111
In loop i,x = 20 0.210195E-43 = 00000000000000000000000000001111
In loop i,x = 21 0.980909E-44 = 00000000000000000000000000000111
In loop i,x = 22 0.420390E-44 = 00000000000000000000000000000011
In loop i,x = 23 0.140130E-44 = 00000000000000000000000000000001
tiny(x)*epsilon(x) = 00000000000000000000000000000001
I found a copy of that report at https://telecomarchive.s3.us-east-2.amazonaws.com/docs/bsp-archive/Letters%20and%20Memos/CSTR/CSTR%2089.pdf
Some compilers have the option to flush denormal numbers to zero. That is apparently what the ifort compiler did, but the ifx compiler doesn’t. Also, it may be possible to set this behavior at runtime with the IEEE library routine IEEE_SET_UNDERFLOW_MODE().
I think this is the difference between model numbers and hardware numbers.
Yep, and I was wrong. Although tiny()
refers to the model numbers, nearest()
refers to the hardware numbers. Consequently, nearest( tiny(1.0), -1.0 )
won’t be 0.0
on machines that handle denormalized numbers.
Regarding this one, all compilers tested on godbolt.org (that is, gfortran, ifx, flang, nvfortran, lfortran) compile the code successfully and print Inf/infinity
. Compilers usually do not check/trap floating point overflows by default. Using ifx -fpe0
traps the fp overflow here, but using gfortran -ffpe-trap=overflow
doesn’t, which I believe is a bug.
I modified the smallest program a little to see how the IEEE functionality works.
program ieee_nearestsmall
use ieee_arithmetic, only: IEEE_SET_UNDERFLOW_MODE
implicit none
call IEEE_SET_UNDERFLOW_MODE (GRADUAL=.false.)
call loop()
call IEEE_SET_UNDERFLOW_MODE (GRADUAL=.true.)
call loop()
contains
subroutine loop()
real :: x, xn
integer :: i
x = tiny(x)
do i = 0, digits(x)+1
xn = nearest(x,-1.0)
print "(2X,A,I3,E14.6,2(A,B32.32))",'In loop i,x =',i,x,' = ',x, ' xn = ', xn
if(x==0) exit
x = x * 0.5
enddo
print "(12X,A,B32.32)",'tiny(x)*epsilon(x) = ',tiny(x)*epsilon(x)
return
end subroutine loop
end program ieee_nearestsmall
$ gfortran nearestsmall.f90 && a.out
In loop i,x = 0 0.117549E-37 = 00000000100000000000000000000000 xn = 00000000011111111111111111111111
In loop i,x = 1 0.000000E+00 = 00000000000000000000000000000000 xn = 10000000000000000000000000000001
tiny(x)*epsilon(x) = 00000000000000000000000000000001
In loop i,x = 0 0.117549E-37 = 00000000100000000000000000000000 xn = 00000000011111111111111111111111
In loop i,x = 1 0.587747E-38 = 00000000010000000000000000000000 xn = 00000000001111111111111111111111
In loop i,x = 2 0.293874E-38 = 00000000001000000000000000000000 xn = 00000000000111111111111111111111
In loop i,x = 3 0.146937E-38 = 00000000000100000000000000000000 xn = 00000000000011111111111111111111
In loop i,x = 4 0.734684E-39 = 00000000000010000000000000000000 xn = 00000000000001111111111111111111
In loop i,x = 5 0.367342E-39 = 00000000000001000000000000000000 xn = 00000000000000111111111111111111
In loop i,x = 6 0.183671E-39 = 00000000000000100000000000000000 xn = 00000000000000011111111111111111
In loop i,x = 7 0.918355E-40 = 00000000000000010000000000000000 xn = 00000000000000001111111111111111
In loop i,x = 8 0.459177E-40 = 00000000000000001000000000000000 xn = 00000000000000000111111111111111
In loop i,x = 9 0.229589E-40 = 00000000000000000100000000000000 xn = 00000000000000000011111111111111
In loop i,x = 10 0.114794E-40 = 00000000000000000010000000000000 xn = 00000000000000000001111111111111
In loop i,x = 11 0.573972E-41 = 00000000000000000001000000000000 xn = 00000000000000000000111111111111
In loop i,x = 12 0.286986E-41 = 00000000000000000000100000000000 xn = 00000000000000000000011111111111
In loop i,x = 13 0.143493E-41 = 00000000000000000000010000000000 xn = 00000000000000000000001111111111
In loop i,x = 14 0.717465E-42 = 00000000000000000000001000000000 xn = 00000000000000000000000111111111
In loop i,x = 15 0.358732E-42 = 00000000000000000000000100000000 xn = 00000000000000000000000011111111
In loop i,x = 16 0.179366E-42 = 00000000000000000000000010000000 xn = 00000000000000000000000001111111
In loop i,x = 17 0.896831E-43 = 00000000000000000000000001000000 xn = 00000000000000000000000000111111
In loop i,x = 18 0.448416E-43 = 00000000000000000000000000100000 xn = 00000000000000000000000000011111
In loop i,x = 19 0.224208E-43 = 00000000000000000000000000010000 xn = 00000000000000000000000000001111
In loop i,x = 20 0.112104E-43 = 00000000000000000000000000001000 xn = 00000000000000000000000000000111
In loop i,x = 21 0.560519E-44 = 00000000000000000000000000000100 xn = 00000000000000000000000000000011
In loop i,x = 22 0.280260E-44 = 00000000000000000000000000000010 xn = 00000000000000000000000000000001
In loop i,x = 23 0.140130E-44 = 00000000000000000000000000000001 xn = 00000000000000000000000000000000
In loop i,x = 24 0.000000E+00 = 00000000000000000000000000000000 xn = 10000000000000000000000000000001
tiny(x)*epsilon(x) = 00000000000000000000000000000001
This is on arm64 hardware with gfortran on MacOS. When gradual underflow is turned off (the first call), nearest()
still returns a denormal number, but the subsequent multiplication by 0.5 flushes to zero as expected. Then notice that the nearest(0.0,-1.0)
returns the negative of the smallest denormal number. That all seems reasonable, but I think nearest(0.0,-1.0)
could also return -tiny(1.0)
here and still be conforming. That is, I think nearest()
is allowed to look at the current underflow mode if it wanted to, and return the value consistent with that. The final tiny(x)*epsilon(x)
value also prints the denormal value rather than flushing to 0.0. That might be an incorrect result, or it might be just the difference in the compile-time and run time evaluation of that expression. Maybe someone with more experience with these intrinsics could comment on this feature. Certainly, on a machine that does not support gradual underflow at all, I would expect these results to all flush to 0.0.
With gradual underflow turned on, the full range of single-bit denormal values is printed. When x
has the smallest denormal value, nearest(x,-1.0)
and the multiplication both return 0.0. As before, nearest(0.0,-1.0)
returns the negative of the smallest denormal number, which I think is the expected result here.
edit: The nagfor compiler on arm64 hardware gives the same results as gfortran, but with a warning at the end of the run that a floating point underflow occurred. In particular, when gradual underflow is turned off, nagfor also returns denormal numbers for both the nearest(0.0,-1.0) expression and for the final product expression. The nagfor compiler also requires transfer()
to write the b32.32 fields.
I think historically a programmer might expect floating point overflows to be trapped and to result in a program abort, while floating point underflows would be expected to silently flush to zero. Of course, that behavior might be modified with compile time options or with run time modifications of the environment.
Here is another program I found instructive, testing what compilers do with nearest(0.0,1.0)
in each ieee_set_underflow(gradual)
possibility.
program ieee_nearestzero
use ieee_arithmetic, only: ieee_set_underflow_mode
implicit none
real x
integer i
do i = 1,2
call ieee_set_underflow_mode(i==1)
x = nearest(0.0,1.0)
print "(A)",'Underflow '//merge('gradual','sudden ',i==1)
print "(A,B32.32,A,L2)",'nearest(0.0,1.0) = ',x,&
' = tiny*epsilon?',x==tiny(0.0)*epsilon(0.0)
end do
end program ieee_nearestzero
I’m curious if any compilers return tiny(1.0)
for this nearest()
value. I’ve looked at three compilers now, and they all return a denormal value regardless of the underflow setting. I wonder if that is the intended result?
Same result as @RonShepard got for the 4 compilers that would compile it. Lfortran hasn’t implemented ieee_set_underflow_mode
yet.