Convert double to quad precision without junk in high digits

The following code produces a result that I did not expect:

  program mytest
  use, intrinsic :: iso_fortran_env, only : dp => real64
  use, intrinsic :: iso_fortran_env, only : qp => real128
  implicit none
  real(dp) :: z
  real(qp) :: u
  z=0.123456789_dp
  u=1._qp*z
  write(6,*)z,u
  end

When I compile and execute this I get the following:
gfortran -o mytest.x mytest.f90
./mytest.x
0.12345678900000000 0.123456788999999997336054491370305186

I’ve tried this with three different compilers. I thought that " u=1._qp*z " would assign u a quad precision representation of z, but I did not expect it to fill the high digits with garbage. I also tried " u=real(z,qp) " but it does not help.

Is there some way to avoid this? I want to have u=1.23456789_qp . I don’t want to do this assignment manually (as in the previous sentence) because more generally it will have some arbitrary double precision value that I want to convert to quadruple precision without having junk in the high digits.

1 Like
use, intrinsic :: iso_fortran_env, only : dp => real64
use, intrinsic :: iso_fortran_env, only : qp => real128
implicit none
real(qp) :: z
real(qp) :: u
z=0.123456789_qp
u=1._qp * z
print *, z,u
end

with output:

ifort dp_quad.f90 && ./a.out 
  0.123456789000000000000000000000000      
  0.123456789000000000000000000000000      

You have to specify the same precision to both z and u. Otherwise, it’s not possible, AFAIK.

Your mistake is to think in decimal arithmetic. Then a number like 0.123456789 is exactly representable with a finite number of decimals. But just like 1/3 is not exactly representable with a finite number of decimals, so is 0.2 not exactly representable in the binary system used in computer arithmetic. The extra decimals you see are due to the conversion of a truncated binary approximation with N “decimals” to one with roughly 2*N “decimals”. The trailing zeroes you see are a consequence of the conversion to a decimal representation, they are not exact.

1 Like

@arjen OK I think I understand. It happens when I convert a single to double, or when I convert a double to quad. For example, if x is single precision and I write x=0.123456789e0 it is stored as a 32 bit floating point number. But 0.123456789e0 is not exactly representable as a 32 bit float, the thing that gets stored internally in binary is something else, something with nonzero trailing digits (like 1/3 in your reply). Now if I convert it to a 64 bit representation, that 64 bit object will match as closely as possible to the thing that got stored as a 32 bit float, it won’t match as closely as possible the thing I started with which was 0.123456789e0. Right?

Spot on! With the enlarged precision in quadruple precision numbers you merely push the “garbage” further away.

You may be interested in the classical article by David Goldberg, What Every Computer Scientist Should Know About Floating-Point Arithmetic It presents all the gory details :slight_smile:

2 Likes

Try the following example, which shows the difference between 0.1 and 0.25.
It is not junk but trailing binary zeros.
It doesn’t look like junk if the 128bit representation of the value has trailing zeros, such as multiples of 0.5, 0.25, 0.125 etc, even 3 * 2^-8 has no trailing junk!

program mytest
use, intrinsic :: iso_fortran_env, only : rp => real32
use, intrinsic :: iso_fortran_env, only : dp => real64
use, intrinsic :: iso_fortran_env, only : qp => real128
implicit none
real(rp) :: r
real(dp) :: z
real(qp) :: u

z = 0.1_dp
u = z
r = z
write(*,*) r,z,u
r = u
z = r
u = z
write(*,*) r,z,u
r = 0.25
z = r
u = z
write(*,*) r,z,u
r = 3.*2.**-8
z = r
u = z
write(*,*) r,z,u
end

It is useful to recognise real values that have lots of trailing zeros, especially when using constants in fortran code, such as “real*16 : u = 2” does not need any extra precision syntax in Fortran code.

real*8 :: time_step = 0.1 is a common source of unexpected inaccuracy in modern Fortran.

3 Likes

Thanks for the Goldberg article. At 80 pages in length I can see that this will take some time!

Thanks, John, for the code. These examples are very informative.

On a more philosophical level you may also be interested by Nicolas Gisin’s work, for example:
“Indeterminism in Physics, Classical Chaos and Bohmian Mechanics: Are Real Numbers Really Real?

Abstract
It is usual to identify initial conditions of classical dynamical systems with mathematical real numbers. However, almost all real numbers contain an infinite amount of information. I argue that a finite volume of space can’t contain more than a finite amount of information, hence that the mathematical real numbers are not physically relevant. Moreover, a better terminology for the so-called real numbers is “random numbers”, as their series of bits are truly random. I propose an alternative classical mechanics, which is empirically equivalent to classical mechanics, but uses only finite-information numbers. This alternative classical mechanics is non-deterministic, despite the use of deterministic equations, in a way similar to quantum theory. Interestingly, both alternative classical mechanics and quantum theories can be supplemented by additional variables in such a way that the supplemented theory is deterministic. Most physicists straightforwardly supplement classical theory with real numbers to which they attribute physical existence, while most physicists reject Bohmian mechanics as supplemented quantum theory, arguing that Bohmian positions have no physical reality.

One point I find helpful on such occasions but nobody mentioned is that a Fortran assignment, say x = y, is done by finding y in whatever precision it was declared with and then putting the answer into x.

Thanks for that reference :slight_smile:

1 Like

Instead of writing the values with list-directed i/o, use a Z format to see all of the bits in the floating point numbers. What is happening with the printed values will then make sense.

Note that, since you did not use the program text button, the statement " u=1._qp*z*" in your program lost the asterisk and came out as " u=1._qpz". This can be quite confusing to readers who find your verbal conclusions mismatched to the apparent version of the program text.

Regarding “junk in the high digits”, there is an adjustment that you have to make in your outlook. As others have remarked, many short decimal fractions such as 0.1 do not have an exact or even a short representation in base 2. Similarly, 1.0/3 has a single digit representation in base 3, namely, “0.1”, but is an endless stream of digits, 0.333…, in base 10. Real numbers in most common computers are represented in base 2, not base 10.

I’ve edited my original post to fix the problem with the missing asterisk.

1 Like

On a related note, it seems unnecessary for me to have written

u=1._qp*z

The presence of 1._qp on the right hand side causes z to be converted to kind(qp), then it gets multiplied by 1._qp, then it’s stored in u which is also of kind(qp).

To achieve the same result, I could have simply written

u=z

Correct?

yes

   u=z

is equivalent.

@vmagnin The article by Gisin on Indeterminism in Physics is thought provoking. But I wasn’t convinced by his argument that there’s a problem with our (deterministic) classical laws of Physics, namely, that they involve real numbers, and “one should not attribute to real numbers, i.e. to numbers that contain an infinite amount of information, any physical significance.”

A minor point: In footnote 1 he alludes to a system he calls Norton’s dome, and he points out that this is a particular example where the equations of motion don’t have a unique solution. But the dynamical system in question involves a differential equation whose right hand side is not Lipschitz continuous. So, the system of this example is not guaranteed to have a unique solution.

@rdyne, for such needs you can consider a utility subprogram using I/O, more specifically internal I/O, say like so:

   use, intrinsic :: iso_fortran_env, only : dp => real64, qp => real128
   real(dp) :: z
   real(qp) :: u
   z=0.123456789_dp
   call dp_to_qp( u, z )
   print *, u
contains
   elemental subroutine dp_to_qp( lhs, rhs )
      real(kind=qp), intent(out) :: lhs
      real(kind=dp), intent(in)  :: rhs
      character(len=*), parameter :: fmtdp = "(1pg22.15)"
      character(len=22) :: s 
      write( s, fmtdp ) rhs
      read( s, fmt=*) lhs
   end subroutine 
end

Edit: unnecessary impure prefix removed as pointed out by @Harper.

FortranFan’s program does not need its keyword impure because the only I/O in subroutine dp_to_qp is to and from an internal file s which is local to the subroutine, by the f2018 standard C1598 in
section 15.7. I tried it successfully with both gfortran and ifort both after changing that keyword to pure and after removing it altogether. Of course the keyword pure was superfluous, by the f2018 standard 15.6.2.1 paragraph 4.

Indeed, thanks for pointing it out. Glad the 2 processors support this Fortran 95 feature, I was under the vague impression the Intel one had some outstanding issue with WRITE statement to an internal file in a PURE subprogram.