Convert double to quad precision without junk in high digits

On a more philosophical level you may also be interested by Nicolas Gisin’s work, for example:
“Indeterminism in Physics, Classical Chaos and Bohmian Mechanics: Are Real Numbers Really Real?

Abstract
It is usual to identify initial conditions of classical dynamical systems with mathematical real numbers. However, almost all real numbers contain an infinite amount of information. I argue that a finite volume of space can’t contain more than a finite amount of information, hence that the mathematical real numbers are not physically relevant. Moreover, a better terminology for the so-called real numbers is “random numbers”, as their series of bits are truly random. I propose an alternative classical mechanics, which is empirically equivalent to classical mechanics, but uses only finite-information numbers. This alternative classical mechanics is non-deterministic, despite the use of deterministic equations, in a way similar to quantum theory. Interestingly, both alternative classical mechanics and quantum theories can be supplemented by additional variables in such a way that the supplemented theory is deterministic. Most physicists straightforwardly supplement classical theory with real numbers to which they attribute physical existence, while most physicists reject Bohmian mechanics as supplemented quantum theory, arguing that Bohmian positions have no physical reality.

One point I find helpful on such occasions but nobody mentioned is that a Fortran assignment, say x = y, is done by finding y in whatever precision it was declared with and then putting the answer into x.

Thanks for that reference :slight_smile:

1 Like

Instead of writing the values with list-directed i/o, use a Z format to see all of the bits in the floating point numbers. What is happening with the printed values will then make sense.

Note that, since you did not use the program text button, the statement " u=1._qp*z*" in your program lost the asterisk and came out as " u=1._qpz". This can be quite confusing to readers who find your verbal conclusions mismatched to the apparent version of the program text.

Regarding “junk in the high digits”, there is an adjustment that you have to make in your outlook. As others have remarked, many short decimal fractions such as 0.1 do not have an exact or even a short representation in base 2. Similarly, 1.0/3 has a single digit representation in base 3, namely, “0.1”, but is an endless stream of digits, 0.333…, in base 10. Real numbers in most common computers are represented in base 2, not base 10.

I’ve edited my original post to fix the problem with the missing asterisk.

1 Like

On a related note, it seems unnecessary for me to have written

u=1._qp*z

The presence of 1._qp on the right hand side causes z to be converted to kind(qp), then it gets multiplied by 1._qp, then it’s stored in u which is also of kind(qp).

To achieve the same result, I could have simply written

u=z

Correct?

yes

   u=z

is equivalent.

@vmagnin The article by Gisin on Indeterminism in Physics is thought provoking. But I wasn’t convinced by his argument that there’s a problem with our (deterministic) classical laws of Physics, namely, that they involve real numbers, and “one should not attribute to real numbers, i.e. to numbers that contain an infinite amount of information, any physical significance.”

A minor point: In footnote 1 he alludes to a system he calls Norton’s dome, and he points out that this is a particular example where the equations of motion don’t have a unique solution. But the dynamical system in question involves a differential equation whose right hand side is not Lipschitz continuous. So, the system of this example is not guaranteed to have a unique solution.

@rdyne, for such needs you can consider a utility subprogram using I/O, more specifically internal I/O, say like so:

   use, intrinsic :: iso_fortran_env, only : dp => real64, qp => real128
   real(dp) :: z
   real(qp) :: u
   z=0.123456789_dp
   call dp_to_qp( u, z )
   print *, u
contains
   elemental subroutine dp_to_qp( lhs, rhs )
      real(kind=qp), intent(out) :: lhs
      real(kind=dp), intent(in)  :: rhs
      character(len=*), parameter :: fmtdp = "(1pg22.15)"
      character(len=22) :: s 
      write( s, fmtdp ) rhs
      read( s, fmt=*) lhs
   end subroutine 
end

Edit: unnecessary impure prefix removed as pointed out by @Harper.

FortranFan’s program does not need its keyword impure because the only I/O in subroutine dp_to_qp is to and from an internal file s which is local to the subroutine, by the f2018 standard C1598 in
section 15.7. I tried it successfully with both gfortran and ifort both after changing that keyword to pure and after removing it altogether. Of course the keyword pure was superfluous, by the f2018 standard 15.6.2.1 paragraph 4.

Indeed, thanks for pointing it out. Glad the 2 processors support this Fortran 95 feature, I was under the vague impression the Intel one had some outstanding issue with WRITE statement to an internal file in a PURE subprogram.

@FortranFan
What has this example achieved ?
You are left with a result where u /= z

You could have equally just had
u = z
write (*,fmt=‘(es22.15)’) u

It you wanted an answer to 15 figures, don’t bother with u at all !
write (*,fmt=‘(es22.15)’) z

More unnecessary bloat !

1 Like

@rdryne , in this instance the simple assignment is the same as u = 1.0_qp * z
and they are equivalent to u = real(z, kind(u) ).

So you can try out the following:

   use, intrinsic :: iso_fortran_env, only : dp => real64, qp => real128
   character(len=*), parameter :: fmtg = "(*(g0))"
   real(dp) :: z
   real(qp) :: u
   z = 0.123456789_dp
   u = z
   print fmtg, "1) u = z: ", u
   u = real( z, kind(u) )
   print fmtg, "2) u = real( z, .. : ", u
   u = 1.0_qp * z
   print fmtg, "3) u = 1.0_qp*z : ", u
   call dp_to_qp( u, z )
   print fmtg, "4) dp_to_qp( u, z ): ", u
contains
   elemental subroutine dp_to_qp( lhs, rhs )
      real(kind=qp), intent(out) :: lhs
      real(kind=dp), intent(in)  :: rhs
      character(len=*), parameter :: fmtdp = "(1pg22.15)"
      character(len=22) :: s 
      write( s, fmtdp ) rhs
      read( s, fmt=*) lhs
   end subroutine 
end

which can lead to the following output:

C:\temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.6.0 Build 20220226_000000
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.30.30706.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
1) u = z: .123456788999999997336054491370305
2) u = real( z, .. : .123456788999999997336054491370305
3) u = 1.0_qp*z : .123456788999999997336054491370305
4) dp_to_qp( u, z ): .123456789000000000000000000000000

Based on your original post, it appears you want the behavior with 4) and that’s why I suggested you to consider such a helper subprogram to help you achieve your goal.

@FortranFan Thanks for this solution. It had not occurred to me that the problem could be solved with internal I/O. Very clever!

I didn’t provided any context for my question, so I’ll do that briefly: Some colleagues noticed that two computations of a complicated function (call it f(z), where z can take on 1000’s of values) produced results that were not identical, in some places the relative error was around 1.e-5. We suspected a loss of precision due to how the calculation was performed. So I wrote a code to calculate the function in double precision and in quad precision. But it occurred to me that the values of z (computed internally by the code) should be numerically the same in both versions, and I was having trouble with that, hence my question. Based on this discussion, I now realize that a better approach is to arrange things so that the values of z are exactly representable in binary.

@rdyne, as I mentioned earlier, the option with the internal IO was just something I meant for you to consider given what you indicated in the original post. Which is that you somehow arrive at a floating-point value programmatically - 0.123456789 as you put it - and you want to represent it as-is in a different (higher) precision for whatever reason.

I did not mean it to be “clever” and please know compared to approaches following via libraries in other languages (e.g., C strtod) the use of internal IO with Fortran can be slower - see this thread.

And I was not intending with my suggestion involving internal IO to get into any messaging to you as to whether it is prudent or not from a numerical analysis involving floating-point calculations, rather leave that up to you, particularly considering you will have best knowledge about your program.

@FortranFan I simply meant that what you proposed was an easy way (just a few lines of code, no external libraries,…) to accomplish what I asked about. But I don’t mean anything more than that. I’m not advocating that other people should do this. People should adopt whatever approaches they think are appropriate to the problem at hand.

The truth is that I went off on a tangent when I started this thread. But it has been informative. Now I can return to my original problem of dealing with a loss of precision in a certain calculation.

1 Like

@kargl My problem does involve large cancellation errors. So I am using Mathematica to rearrange the underlying formulas to avoid small denominators that need not be present.

@kargl I have used David Bailey’s programs in the past. But I don’t need to in this case. I succeeded in using Mathematica to produce Fortran code for the function that I am evaluating, and to do it in a way that the small denominators in separate terms (that led to the cancellation of large terms) are not longer present. The plot shows the relative difference between double and quad precision for the function computed at 1025 points in the range (0,1). The original formula has huge errors. But it’s better than 1.e-14 with the rearranged formula.