Loss of data: help needed

Hi All

What is the best way to write the following code in Fortran without loss of data?
v = 0x530e * std::pow(2.0, 64.0) + 0xda74000000000000

The expected output is 392230413701810053185536.0

In Fortran, I tried this:
v = int(z"530e", int64) * 2.0_real64**64 + int(z"da74000000000000", int64)

The output I got is 392211966957736343633920.0

Any help will be appreciated.

The problem is that z"da74000000000000" is larger than the largest representable signed 64 bits integer. As opposed to C, Fortran doesn’t have unsigned integers, which are needed here to properly represent this literal constant.

program foo
use iso_fortran_env
implicit none
real(real64) :: v1, v2, v3

 v1 = int(z"530e", int64) * 2.0_real64**64 
 v2 = int(z"da74000000000000",int64)
 v3 = int(z"da7400000000000",int64)
 v3 = v3*16

 print*, v1
 print*, v2
 print*, v3
 print*, v1+v3




And, to add to @PierU replay, the result you quote is well beyond the precision of real64 values (which is 15-16 decimal digits) so even if you fix the problem with the unsigned hex constant, you cannot expect the output to be identical to the last, 24-th digit.

In general, this would be correct. However, in this particular case, the floating point number is an integer value with trailing zeros in the exact binary representation, so it actually is the correct number that is being evaluated and printed. Here is a simplified version of the code that demonstrates this.

program xxx
   implicit none
   integer, parameter :: real64 = selected_real_kind(14), real128 = selected_real_kind(30)
   real(real64) :: r64
   real(real128) :: r128
   write(*,'(*(g0,1x))') 'real kinds are:', real64, real128
   r64 = scale( real( int(z'530e0000') + int(z'da74'), real64), 48)
   write(*,'(f0.1,z17.16)') r64, r64
   r128 = scale( real( int(z'530e0000') + int(z'da74'), real128), 48)
   write(*,'(f0.1,z33.32)') r128, r128
end program xxx

$ gfortran xxx.f90 && a.out
real kinds are: 8 16
392230413701810053185536.0 44D4C3B69D000000
392230413701810053185536.0 404D4C3B69D000000000000000000000

With the appropriate scaling, the integer sum can be done with default (int32) integer arithmetic. The conversion to real is exact, because the integer sum does not overflow, and then the final scaling is exact because the mantissa is unchanged and only the exponent bits are changed. I used the scale() intrinsic instead of multiplication, but the same bits are computed either way.

Thank you @RonShepard, @PierU and @msz59 for your insight and explanation.

Happy 2024.