Can floating point literals be adapted by the compiler to double precision variable?

This is another common gotcha:

and I wonder if the compiler could not simply take a code like this:

integer, parameter :: dp = kind(0.d0)
real :: x
real(dp) :: y
x = 5.35
y = 5.35

and it would internally treat it as:

integer, parameter :: dp = kind(0.d0)
real :: x
real(dp) :: y
x = 5.35
y = 5.35_dp

In other words, it would look where the floating point number is assigned to and adapt it accordingly.

One way to describe it could be that a floating point literal like 5.35 would be “infinite” precision, and then when used in operations, it would be “downcasted” to whatever precision is needed for the operation.

For example, 5.35 / y would implicitly become real(5.35, dp) / y because y is double precision. If y was quadruple precision qp then it would become real(5.35, qp) / y.

Two questions:

  • Could this break existing codes in any way?
  • Would this completely fix the current problem that one can assign a single precision number 5.35 to a double precision variable and lose accuracy?

I believe that a general rule of Fortran is that what is on the LHS of an assignment never affects how the RHS is evaluated, and someone new to Fortran should learn that. I suggest that LFortran not make exceptions to that rule. You don’t want people to write bad code that LFortran silently fixes, only for them to encounter problems when using other compilers. For your first code LFortran could emit a warning about

y = 5.35

such as “double precision variable assigned to a real value”, and the user could turn such a warning into an error. This could be made the default, although in strict mode LFortran must compile the legal code. For the code

integer, parameter :: dp = kind(0.0d0)
real :: x
real(dp) :: y,z
x = 5.35
y = 5.35
z = 5.35_dp

gfortran -Wconversion-extra says


    5 | y = 5.35
      |    1
Warning: Conversion from 'REAL(4)' to 'REAL(8)' at (1) [-Wconversion-extra]

and gives output

5.34999990 5.3499999046325684 5.3499999999999996

I see that I suggested your idea in 2014 in comp.lang.fortran. Richard Maine pointed out some difficulties. Now I’m against it.


Good points.

Let’s assume the rule:

  • The RHS evaluation must not depend on the LHS.

Currently a floating point literal 5.35 is assumed to be single precision. I think to be more precise, I think it is assumed to be the same precision as the default real type.

So one way forward is to make floating point literals like 5.35 either the widest precision supported by a given compiler (say double or quadruple) or even “infinite” precision.

I don’t know if there can be issues with double / quadruple precision, especially with regards to the default real type.

However, the infinite precision would work as follows: 5.35 would simply mean 5.35 exactly. It is not equal to any other real type. Then when you assign it, the = operator casts it appropriately. The same with binary operators like + or *. They already insert implicit casts such as in 1 + 1.5, the integer gets cast to a float first. Would this not work?


I don’t think so. We still want it to be legal to write f(5.35), where f is a function with a single precision argument. I don’t think the meaning of 5.35 should depend on the context. The previously cited comp.lang.fortran discussion has a similar comment:

Richard Maine

May 30, 2014, 8:52:57 PM

Quadibloc <jsa…> wrote:

Could a simple and predictable rule be established that would allow a
language like FORTRAN, where different real precisions have (near-)equal
stature, to still be very simple?

I would say that when you have 1.1 instead of 1.1E0, 1.1D0, or 1.1Q0 (for
EXTENDED PRECISION), the thing to do is to say that it’s initially
converted to binary at the maximum available precision (i.e. 1.1Q0,
128-bit real or 80-bit real) and then possibly demoted the first time it
comes into contact with something of a specified real type.

And if it never “comes in contact” with anything else, then it stays at
the maximum precision? So

call sub(1.1)

doesn’t work if the dummy argument of sub is default real? And assume
that sub has an implicit interface, so in the calling scope you don’t
know anything about the dummy. Or perhaps that sub is generic with
multiple possible kinds.

I guess I find the existing rule a lot simpler than any alternative
proposal I’ve seen. The existing rule is that 1.1 is a real of default
kind - always. Sure, one can make mistakes ignoring that rule just like
one can make mistakes by ignoring other rules. But I don’t think you can
beat it for simplicity.

P.S. If anyone is in touch with Richard Maine, could they invite him to join this forum?

I find the suffix d in Fortran super annoying. Most scientific codes nowadays use double precision real numbers. That means people need to write a lot of suffix d in Fortran, which greatly hurts readability.


This is a bad idea if Fortran ever wants to get the ability to write generic code. In such cases it will be far less clear what type to make things.

@certik and anyone interested in the default kinds of constants (also literals), please see this proposal:

I personally think that is the Fortrannesque way to introduce the use cases into the language, as discussed thus far in this thread, without affecting existing semantics.

I truly wish the above proposal (or some further refinement thereof) had made it into the language in Fortran 2018 itself.


I see two options for a single sub subroutine it could:

  • cast into single if sub accepts single, or double if it accepts a double precision
  • or not work and you have to specify the kind explicitly

And if sub is a general procedure, it might:

  • refuse to work until a kind is specified
  • or select the widest precision available

For implicit interface:

  • implicit interface should not be used anyway, but it can default to single

I am not sure if I like this or not.

Another option might be to simply make the default real a double precision. In Julia:

julia> 4.5 

julia> typeof(4.5)
1 Like

I’m afraid that this “infinite precision” will require an infinite number of bits. Your number, 5.35, is not representable exactly in IEEE 32-bit format, which uses a number base of 2, not 10. This number has a bit representation of Z’40AB3333’, and there follow an infinite number of Z’33333…’ that got chopped off.

Similarly, the “simple” single-digit decimal number 0.1 does not have a finite representation in the internal binary representation. The IEEE 32-bit representation is Z’3DCCCCCD’, and the last nybble is D rather than C because of rounding. If you want infinite precision, you would need an infinite number of C-s at the right end.

Yes, it promotes the single precision 5.35 to double, but it will only be accurate to about 1e-7:

integer, parameter :: dp = kind(0.d0)
real :: x
real(dp) :: y
x = 5.35
y = 1
y = 5.35 / y
print *, x
print *, y

This prints with GFortran 9.3.0:


The way I would like it to promote it is like in y = 5.35_dp / y and then it prints:


What I have in mind is to make 5.35 be something like Decimal("5.35"), so an exact value. And then in any operation it just gets “downcasted” to whatever accuracy is needed. So real(5.35, dp) becomes 5.3499999999999996 and not 5.3499999046325684 as it currently happens.

Indeed, we do not want to have any kind of slowdowns and we definitely do not want to introduce quadruple precision (which is indeed slow) unless needed.

At the very least I want to just warn by default, i.e. the -Wconversion-extra in gfortran:

$ gfortran -Wconversion-extra  a.f90

    6 | y = 5.35 / y
      |        1
Warning: Conversion from ‘REAL(4)’ to ‘REAL(8)’ at (1) [-Wconversion-extra]

Although I would prefer it if the warning was on by default and something along the lines of:

$ gfortran a.f90

    6 | y = 5.35 / y
      |        1
Warning: Conversion from ‘REAL(4)’ to ‘REAL(8)’ at (1) will make 'y' only accurate to single precision
Suggestion: change 5.35 to 5.35_dp to achieve full double precision accuracy
Note: you can turn off this warning with [-Wno-conversion-extra]

Are there implementations of Fortran (or packages) that provide a good DoubleDouble type? DoubleDoubles (using the sum of 2 Float64) tends to be a very high performance solution when you need more accuracy than a Float64. You get 106 bits of precision and the same exponent range as Float64, but the basic math operations only take 10 or so cycles

I think the doubledouble library QD from Bailey is actually implemented in C++, with a Fortran wrapper.

Looking through his bibliography I think there was an older package which was apparently in Fortran, but I recall not being able to find the source code.

1 Like

The NAG Fortran Compiler implements REAL128 with a “double double”.

1 Like

I agree the current state of affairs would be difficult to improve upon without breaking backward compatibility, but I was thinking how long this has been a point of confusion. To help explain it in the past I used to use an example somewhat like the following, especially to highlight how E and D suffixes do not impact NAMELIST and most input, but are effective on output, and to warn about truncation of long constants without an explicit type. It went something like this, except I know “g0” was not available:

program pidigits
character(len=80) :: cpi='3.141592653589793238462643383279502884197169399375105820974944592307'
character(len=80) :: cpi1='3.141592653589793238462643383279502884197169399375105820974944592307e0'
character(len=80) :: cpi2='3.141592653589793238462643383279502884197169399375105820974944592307d0'
doubleprecision :: pi,pi1,pi2,pi3,pi4,pi5     
namelist /args/ pi
character(len=256) :: input(3)=[character(len=256) :: &
&'&args', &
& ' pi=3.141592653589793238462643383279502884197169399375105820974944592307,', &
& '/']

data pi2/3.141592653589793238462643383279502884197169399375105820974944592307e0/
data pi3/3.141592653589793238462643383279502884197169399375105820974944592307d0/





   write(*,'(g0)') 3.141592653589793238462643383279502884197169399375105820974944592307
   write(*,'(g0)') 3.141592653589793238462643383279502884197169399375105820974944592307e0
   write(*,'(g0)') 3.141592653589793238462643383279502884197169399375105820974944592307d0

end program pidigits

and how the value of a constant may change with compiler options such as

ifort pi.f90 -r8
nvfortran pi.f90 -r8
gfortran -fdefault-real-8 pi.f90 -fdefault-double-8

was always an issue too.

A common comment was that most thought that a constant should be promoted to a type large enough for all the explicitly given digits or a warning be required to be given.

I think all modern compilers have a switch that will produce a warning about constants being truncated and about precision possibly being lost when the LHS and RHS types do not match,
which people should be encouraged to use, especially as more users now use languages with features such as arbitrary precision that therefore might be even more surprised than in the past about values they explicitly entered being significantly truncated.

Even if automatic promotion to a kind sufficient to hold the explicit values was the rule, people would probably still be surprised that “A=1.0” might not produce the same as “A=1.00000000000”, so I am not certain that change would be better, and there is still the issue that compilers can have different default types; but I do think the most inadvertent error is caused by “double_value = single_precision_constant”, which that would not address anyway.

Maybe a new syntax like “value <== constant” where the constant would be treated the same as if it were read with list-directed input would address constants, but that would not address where the constants are used in functions.

And it looks like the rule about LHS being ignored when determining the value on the RHS
is too sacrosanct to be broken, which seems like the only logical path to improving the current state. SIgh. I really have had a number of complaints about this though. It is very non-intuitive .

So overall, it looks like the status-quo is a reasonable compromise, but quite a mess.

1 Like

The basic issue here is that the type and kind of an expression is determined by the expression (3.51 is an expression with only one primary), and not by how the expression is used later. So changing to “auto-promote” constants would result in a pretty basic (and backwards incompatible) change in the standard.

I agree that a lot of scientific code is written using 64-bit precision floats. This is why Cray, before migrating to x86_64 processors, made 64-bit floats the default REAL kind. Thus 3.51 defaulted to be 64 bits as an expression and “problem solved”. Other compilers, wanting to be compatible, added options named, often, -r8 to convert to the Cray default mode.

1 Like

Is 64-bit the default in Cray even today? I didn’t know that. I think that would fix most problems.

If you want to select a single precision in Cray, I assume you can’t do just integer, parameter :: sp = kind(0.0) because that will be double precision. So have to use either selected_real_kind or iso_fortran_env?

The classic vector machines were 64-bit default. The default on recent Cray systems is 32-bit for both REAL and INTEGER. You can use the compiler option -sdefault64 to change both defaults to 64-bits. The default is driven by the hardware instruction set, which today is usually either x86_64 (Intel, AMD) or ARM. Both use 32-bit as default. The most portable scheme is to use selected_real_kind.

1 Like

Thanks Bill. Last question – I thought that x86_64 as well as 64-bit ARM would be considered to use 64-bit by default, wouldn’t they?

The interpretation of 5.35 in Fortran code is now as a single precision constant 5.35E0 or 5.35_real32.
However, the interpretation of “5.35” read from a text file or character string into a higher precision real variable will be interpreted as 5.35_real128 if read into a quad precision real variable.

Prior to Fortran 90 most 32 bit compilers interpreted y=5.35 as 5.35_real64 or 5.35_real80, but Fortran 90 defined that 5.35 used in Fortran code is a default real constant.

For earlier compilers, the outcome was not defined, however most 32-bit compilers tried to support the precision required when converting 60-bit Fortran code to 32-bit Fortran, where 64-bit reals were the typical real precision in use.
Most programmers at the time learnt from experience, where converting to a F90+ compiler that resulted in a reduced precision outcome from conversion testing. This was further exacerbated on intel like processors when converting to 64-bit Fortran or SIMD, where 80-bit 80387 registers were not used, again removing default higher precision values.

The interpretation of 5.35 as a default real now has a strong historical understanding and so returning to higher precision interpretation, as with a read, is very unlikely.

The other confusion aspect that can hide this from a new user to Fortran is that many common real constants are the same at any precision, eg 1.0 2.0 10.0 or 0.5, while 0.1 or 1./3. are not and can be the source of unexpected rounding error when selecting real64 or real128 precision.

This problem is not limited to reals, as in the 64-bit environment where 64-bit integers are also required, using integer constants can overflow when 64-bit addresses or array sizes are used. The default is not always what you want, eg size (array).

This is a real differentiator between the Fortran standard and Fortran users.


I can not identify where this could occur in Fortran, as any real precision value of 1.0 should be implicitly converted to the same value in any precision. “A = 0.1” would be a very different case.