IEEE 754 binary interchange floating-point formats versus `iso_fortran_env` real kinds

IEEE Std 754-2019 specifies the “Binary interchange floating-point formats” binary32, binary64, and binary128, while iso_fortran_env includes real32, real64, and real128.

It is not surprising that Fortran compilers do not necessarily implement realKaccording to binaryK (K = 32, 64, 128). Anyway, the Fortran standard does not talk about IEEE 754 at all.

For example,

  • real128 of nagfor 7.2 has the “minimum exponent” mineponent(x) = -968 and the “maximum exponent” maxexponent(x) = 1023;
  • binary128 of IEEE has the “minimum exponent” emin = -16382 and the “maximum exponent” emax = 16383.

Note that they use the same terminology (minimum / maximum exponent) despite different notations. Nonetheless, there is no surprise. Nobody says real128 must be binary128.

However, I did not find out until today that the mathematical definition of [maxexponent(x), minexponent(x)] in Fortran standards indeed differs from that of [emin, emax] in IEEE 754 even for the same numeric model. See, e.g., Sec. 16.4 of J3/24-007 for the former and Sec. 3.3 of IEEE Std 754-2019 for the latter.

Given the same floating-point numeric model, it turns out that

[\texttt{minexponent(x)}, \texttt{maxexponent(x)}] \text{~in Fortran} ~=~ [emin + 1, emax + 1] \text{~in IEEE}.

For example, the implementations of real32 in gfortran, ifx, nagfor, flang, and nvfortran all seem to align with binary32, but

[\texttt{minexponent(x)}, \texttt{maxexponent(x)}] \text{~in Fortran} ~=~ [-125, 128],

whereas

[emin, emax] \text{~in IEEE} ~=~ [-126, 127].

Of course, this is not a mathematical difference, but only a notational one. No big deal, nothing nontrivial, just different notations adopted by different documents, but it might be better to know.

1 Like

That is because it refers to ISO/IEC 60559:2020, the current international standard that originated in IEEE 754.

2 Likes

I’ll also note that the “real model” in the standard is not intended to match any particular implementation, which can sometimes lead to surprising results when looking at the return values of certain intrinsic functions. The next revision will add a note about this - see j3-fortran.org/doc/year/25/25-193r1.txt

2 Likes

Thank you, Dr. Fortran, for pointing this out.

Quoting https://j3-fortran.org/doc/year/25/25-193r1.txt:

The integer and real models do not necessarily reflect any processor’s implementation.

Does this mean that the processors do not need to be standard conforming? If yes, then what should be the relation between the processors and the standard?

Isn’t the standard supposed to be a specification of how the processors should behave? Otherwise, how should we regard the standard?

First of all, the Fortran standard describes a standard-conforming program. If a processor wants to claim it conforms, it should process a conforming source in the manner prescribed by the standard.

A given processor supports one or more models of real (float) values, but they are not necessarily IEEE floats. While IEEE float is nearly universal now, it wasn’t always so in the past. DEC VAX and IBM System\360 floats. for example, were different.

The reason we are adding the note is that some users have complained that the standard’s example results for certain inquiry intrinsics don’t match IEEE float. For example:

Example. MAXEXPONENT (X) has the value 127 for real X whose model is as in 16.4,

If you try this on a processor where X is an IEEE single type, you’ll get 128 instead of 127.Why? Because the real model in the standard specifies:

but the implicit leading 1 bit in IEEE float throws this off. MINEXPONENT is similar where the example shows -126 as a result but on a processor where the argument is an IEEE single, you get -125,

3 Likes