What would it take to get the legacy output format sn.nnnnsnnn
(s
is sign, n
is a digit) fast-tracked for removal from the language?
Overflowing an ‘E’ output edit descriptor can lead to malformed and in cases dangerously misinterpreted output, specifically when the exponent has more digits than the implied default of 2. The ‘E’ separator between significand and exponent is dropped and is replaced by the sign of the exponent. This behavior is allowed by the standard and documented but it violates user expectations of what constitutes scientific notation. There is likely a clear historical explanation for this behavior similar to that of the legacy ASA carriage control codes.
The difficulty goes beyond the human factors problem of using an archaic malformed numeric display. Few if any other languages interpret this format correctly which breaks data portability via text file, the principal means of data interchange.
The following demonstrator compiles cleanly on Intel Fortran (the specific compiler shouldn’t matter) and generates CSV output that reliably breaks both Python’s float()
type conversion and has a particularly insidious, quiet, and dangerous failure mode when read as CSV by Excel.
program trivial_loss
use, intrinsic :: iso_fortran_env, only: REAL64, OUTPUT_UNIT
implicit none
real(kind=REAL64), dimension(6), parameter :: baddata = &
[ 1.0E+100_REAL64, 10.0_REAL64, 1.0E-100_REAL64, &
-1.0E-100_REAL64, -10.0_REAL64, -1.0E+100_REAL64 ]
integer i
integer j
1 format('"Full","Naive","Nominal","Better","Malformed"')
2 format(G0, ',', E11.4, ',', 1PE11.4, ',', ES13.4E4, ',', ES0.4E0)
continue
write(unit=OUTPUT_UNIT, fmt=1)
do i = 1, size(baddata, 1)
write(unit=OUTPUT_UNIT, fmt=2) (baddata(i), j=1,5)
end do
stop
end program trivial_loss
Example output:
"Full","Naive","Nominal","Better","Malformed"
.1000000000000000E+101, 0.1000+101, 1.0000+100, 1.0000E+0100,1.0000+100
10.00000000000000, 0.1000E+02, 1.0000E+01, 1.0000E+0001,1.0000+1
.1000000000000000E-99, 0.1000E-99, 1.0000-100, 1.0000E-0100,1.0000-100
-.1000000000000000E-99,-0.1000E-99,-1.0000-100,-1.0000E-0100,-1.0000-100
-10.00000000000000,-0.1000E+02,-1.0000E+01,-1.0000E+0001,-1.0000+1
-.1000000000000000E+101,-0.1000+101,-1.0000+100,-1.0000E+0100,-1.0000+100
Python is doing the right thing by throwing an exception when asked to convert malformed numerical data. Excel is half-working, treating positive malformed data as a string but evaluating negative malformed data as a formula when imported as values.
Importing CSV-formatted numerical data is a common use case and while it’s not Fortran’s job to work around every possible misinterpratation of its output, likewise it should not generate malformed output. The user expectation is that values will be formatted in scientific notation or values will be clearly marked as unrepresentable in the specified format (all *
s).
I’ve seen this behavior before but it most recently appeared when trying to parse radiation dose results generated by actively developed code. Investigating the issue, I found that using a field width of 0
actually made matters worse despite guidance that using a zero-width field avoids clearly invalid output.
This is a confluence of small seemingly-inconsequential problems across several systems:
- the default behavior of the ‘E’ edit descriptor does not take into account the extended width of the exponent field in 64-bit or larger floating point types.
- the historical
sn.nnnnsnnn
format is incompatible with most systems where Fortran codes are currently used - an
E
format combined with zero field width can reliably generate malformed output for all values, not just those with exponents of more than two digits - developers are given no warning that this behavior is possible; contrast with the
w < d + 7
warning that most compilers produce when a risky format such asE10.5
is detected - scientific software developers are often unaware of Fortran’s standard behavior or potential mitigations
- evaluating whether very large or very small magnitude values are reasonable is a verification task which is often omitted even in mission- or safety-significant applications; regardless, the language and common tooling do not make this verification task any easier
- Excel exhibits unexpected and dangerous behavior when converting text values to numerical quantities. This is a well known flaw in the application which the vendor is aware of and has no intention to address
- Culturally, workers in safety- and mission-critical roles routinely use Excel in roles where it is not appropriate and no amount of training, education, policy, or disciplinary measures will change that.
I don’t see user culture or Excel as being fixable. Both tools and developer practice can change with enough time and incentive but I also don’t see that happening in a reasonable or predictable timeframe. Knowing that Fortran has been and is currently used in roles where it can affect safety, this issue seems more effectively addressed at a language standard level.