Newbie question about mixing types in floating point arithmetic

Hi all,

Say I have two numbers, a and b, respectively with type real(kind(1.0)) (single precision) and real(kind(1.d0)) (double precision).

If a can be represented as a dyadic rational that can be exactly represented in single precision, the result of a*b is the same as real(a,kind(1.d0))*b?

I am trying to understand if there is any reason for me to be cautious and upcasting constants in operations like 0.125_rp*b with b in double precision (and rp=kind(1.d0)), since 0.125 is a number that can be exactly represented in single precision.

Thanks!

I have wondered about how to write code in these cases too.

First, a*b is always equivalent to real(a,kind(1.d0))*b. You don’t need any special conditions about exact representations or anything else, the language simply defines those two expressions to be the same. As a practical matter, a compiler will produce the same machine instructions for those two cases. The only difference is that in the first case you are relying on the compiler to do the required conversion automatically, while in the second case you are writing it out explicitly.

However, the compiler is also allowed a lot of flexibility in rearranging expressions. There are some restrictions, like respecting parentheses, but it can still do a lot behind the curtain. Also, all floating point operations, even simple additions and multiplications, are allowed to be approximate. I think IEEE arithmetic tightens up some of those restrictions, but the fortran standard itself never requires exact floating point arithmetic, even in those cases where the exact result is exactly representable. In your example, 0.125 can be represented exactly in both single and double precision, but the compiler is not required by the language standard to do that conversion exactly. It should, and programmers can rightly expect it, but it is a quality of implementation issue, not a requirement by the language standard.

One thing I sometimes do in source code is to write the expression as (a)*b. This kind of draws attention to the fact that an implicit conversion is being done, while adding only two extra characters to the expression. I sometimes do this when a is an integer too, for pretty much the same reason.

2 Likes

First, a*b is always equivalent to real(a,kind(1.d0))*b. You don’t need any special conditions about exact representations or anything else, the language simply defines those two expressions to be the same. As a practical matter, a compiler will produce the same machine instructions for those two cases. The only difference is that in the first case you are relying on the compiler to do the required conversion automatically, while in the second case you are writing it out explicitly.

Thank you! I realized this after posting, but indeed I was interested in the second part of my question, which you also responded :).