Test for valid format strings

Beliavsky · April 14, 2025, 3:00pm

Sometimes my programs crash because of a malformed format string. For the code

print "(a i0)", "hi", 3
end

where the format should be "(a,i0)" with a comma, ifx gives a compile time error, but gfortran compiles and gives a run-time error. Sometimes format strings are built at run time. It would be nice if a Fortran subroutine could be created to check if a format string is valid and perhaps fix it. If it is invalid one can print or write using list-directed output with * or skip the print or write.

If a program runs quickly it is not a problem to fix the format string after a crash (or to fix the code that generated the invalid format string), but you do not want a long program run to crash for this reason.

urbanjost · April 14, 2025, 4:59pm

You describe a partial solution. If you check IOSTAT and if not zero switch to something with all G0 fields or skip the I/O and/or just print a warning that can at least stop the crash for intrinsic types and types that list-directed I/O can handle.

I find defaulting to NAMELIST group output is pretty robust and can print user types in most cases. I generally only do that for interactive programs where the user can take appropriate action but just checking the IOSTAT value lets you easily skip the print or write, which is one case you describe. And adding error messages and reprinting values with NAMELIST or g0 can be useful, but in production batch codes I do often want to stop if bad I/O is generated, but perhaps call a teardown routine and shut down gracefully --which checking the IOSTAT flag also lets you do.

RonShepard · April 14, 2025, 5:00pm

I remember back in the f77 days that compilers would typically check format statements at compile time, but not check character strings used as formats until run time, even if they were constants or parameters that were obviously used unchanged. This had the arguably undesired side effect of encouraging the continued use of format statements. Over time, that changed, and now compilers will more likely test the obvious cases at compile time. I’m not sure this can be imposed by the standard because there are still many cases that cannot be tested at compile time, but only at run time.

As for code quality, the programmer must consequently ensure that each format statement is used at least once during the testing and verification stages. This is the same principle as for other conditionally executed blocks of code. I’ve seen blocks of code still with errors years after the code was written simply because that block was rarely, or never, executed with the typical input data. Programmers can sometimes get compilers to profile code at runtime, showing the number of times each line has been executed. The danger sign is when you have statements that are never executed, either with test input data designed to test all possible code routes, or with actual data during production.

urbanjost · April 14, 2025, 5:07pm

Yes. Hit that on the head. Coverage tools like gcov can help with that. You describe one common condition I see over and over again, where error processing code has a bug in it, so the very code meant to handle or describe an error instead causes a crash or traceback at best. I see that in well-known commercial codes relatively frequently. Do something like try to run some codes in a directory with no read or write access and/or with $TMPDIR set to such a directory and you will probably find one quickly – error processing that causes a bigger error that what it is meant to diagnose.

Beliavsky · April 14, 2025, 5:33pm

I see that

implicit none
integer :: ierr
write (*,"(a i0)", iostat=ierr) "hi", 3
if (ierr /= 0) print*,"could not write"
end

does not crash with gfortran. So one approach is to always use iostat when a format string is generated dynamically, because it could be wrong.

urbanjost · April 14, 2025, 6:12pm

Even a static format can be wrong if an allocatable item changes size or is unallocated. A pet peeve is that getting an asterisk from overflow is not an error as far as IOSTAT is concerned, because that is what is supposed to happen. Formats like i0 and g0 eliminate that issue to some extent, but if you want a particular field used the way I used to handle detecting that was to write to an internal file and check it for asterisks and then print it, which is an expensive amount of resources to use. Maybe something has come along to make that easier to detect,

write(*,'(i2)',iostat=iostat) huge(0)
write(*,*)iostat
end

gets an IOSTAT of 0 per the Fortran standard if I remember correctly. Of course it does not create a crash on output either; but reading that back in does cause errors.

urbanjost · April 14, 2025, 6:20pm

On production code I think ALWAYS checking iostat and iomsg is a good idea, as not just a bad Format can be caught, but filled file systems, insufficient file permits, read-only file systems where you try to write, … Unfortunately the messages and the codes returned differ between compilers or that would be even more useful. I used to have a code that intentionally did some of those things and captured the output and wrote a routine that standardized the return code for the most common errors but I have not used in in a long time. But just knowing an error occurred can be very useful.

RonShepard · April 14, 2025, 7:26pm

It used to be that using iostat was sometimes a good thing to do and sometimes not so good. The i/o library error message was sometimes more useful than an internally generated error message based on the program state. But now that iomsg has been added, the programmer has access to the system message and he can also add his own information based on the program state at that time. The combination of iostat and iomsg is a really useful feature.

wspector · April 14, 2025, 9:29pm

I think the reason gfortran is not catching this at compile time is that it supports an extension allowing eliding commas between edit descriptors when possible. If you specify the -std=f95 (or later) option, it will diagnose the problem.

$ gfortran -c -std=f95 -Wall badfmt.f90
badfmt.f90:3:12:

    3 | write (*,"(a i0)", iostat=ierr) "hi", 3
      |            1
Error: GNU Extension: Missing comma at (1)
$

urbanjost · April 14, 2025, 9:58pm

Enforcing a modern standard is important as for historical reasons you might get an error depending on whether fixed for free format is used, as some Fortran versions will see that as “A10” and print both values as strings, and allow for ADE values to be printed as characters and you will get no errors with or without IOSTAT, others will produce an error. I have seen spaces be ignored in format strings even in free-format code; etc. So even if you do not compile with a standard being enforced running a compile dryrun (where supported) can give you things to think about. If compiling up code files that are mixed free and fixed format and placing formats in CHARACTER variables the default behavior can be very generous in how it treats white space and “missing” comma separators which makes some sense but can allow for some surprises when the strings are in a global area.

Here is one simple but perhaps surprising aspect of the A field. We are probably used to being able to print anything with typeless format descriptors like B,O,Z but A prints anything as well. I think that is standard-conforming but could not find it defined off the cuff, but I think the A descriptor printing the following text is not an error. Somewhere in the past I remember that being valuable for writing a file where I wanted to mix ASCII and binary data but the exact use case escapes me at the moment.

program main
  use ade, only: say_hello
  implicit none
  integer :: i
  character(len=*),parameter :: s='Hello World!'
  write(*,'(*(a))')32,(i,i=30,127)
  write(*,'(*(a))')123.456
  write(*,'(*(b0))')32,(i,i=30,127)
  write(*,'(*(b0))')123.456
  write(*,'(*(o0))')32,(i,i=30,127)
  write(*,'(*(o0))')123.456
  write(*,'(*(z0))')32,(i,i=30,127)
  write(*,'(*(z0))')123.456

end program main

RonShepard · April 15, 2025, 6:58am

Are there any differences in format strings in fixed-format and free-format code? That never occurred to me.

Regarding the default A10 format, I’ve never seen that. Is that a legacy feature of CDC codes, which packed 10 six-bit characters into a 60-bit word?

The reason that many compilers allow reading/writing integers (and other data types) with An format descriptors is a legacy feature. For two decades before f77 there was no character type in the fortran language. Instead, characters were packed into integer variables and those integers were read and written with An field descriptors. Then with f77’s character type, just an A descriptor was allowed because the character type itself contained the length information. This was also related to the Hollerith data issue, a feature which allowed integers to be initialized with data statements with characters in pre-f77 codes, but was just a legacy extension afterwards.

urbanjost · April 15, 2025, 1:32pm

spaces are not significant in fixed-format code and historically commas were not required so is the format ‘A 10’ just ‘A10’, a legitimate descriptor, or a typo? With -std=f95 and free format the comma is missing so it says it is an error. But in fixed format the space would be ignored and it would see the descriptor A10, I believe.

It is handy for ANSI escape sequences, so something like this works well

write(*,'(*(a))',advance='no') 27,91,51,59,74,27,91,72,27,91,50,74
end

to clear the screen and home the cursor in an ANSI terminal but would probably be found to be cryptic by newer Fortran users in particular.

The history does explain why that is apparently standard-conforming but it does not hold up with strict typing, which seems to be strongly encouraged in the Fortran community.

wspector · April 15, 2025, 3:23pm

The test case here is not A10 (“A-ten”). It is AI0 (A-eye-zero). Commas are generally required between edit descriptors, and always have been. But historically some compilers have often allowed eliding them. (I remember playing with this back in F66 days to see how many I could get rid of before the compilers would complain.) The blank space between the “a” and the “i0” is irrelevant.

RonShepard · April 15, 2025, 6:13pm

Ah, that was the confusion. With my font, those are distinct shapes, so no confusion, but I can see how it might be different with other display fonts.

As far as I know, there is no difference in how format strings are parsed between fixed-format fortran code and free-form fortran code.

I always tried to avoid the extensions provided by some compilers for format strings because I used a lot of different compilers and I wanted my code to be portable. But of course, I would often get code from others that did use them. I remember $ in vax fortran. Another one that I remember was X was treated as 1X, a single space in some compilers. However, there are also places where the commas are optional within the standard, such as /,/ being the same as //, or /10X being the same as /,10X, so it was sometimes difficult to keep it all straight.

wspector · April 15, 2025, 11:45pm

I don’t know of any differences either.

Indeed the slash for new-lines can also be used to separate edit descriptors. No commas needed. Parenthesis are another example. The DEC ‘$’ extension was used at the end of a line to avoid a newline. These days one should use advance='no'.

wspector · April 17, 2025, 3:27pm

Turns out the gfortran folks have had a PR on missing commas in formats since 2018. They recently modified the compiler to require -std=legacy in order to compile the non-conforming code. See:

(My copy of gfortran 15 is from last October, so doesn’t reflect this change. I need to rebuild it…)

(Update: Just rebuilt the current trunk. It now shows Version 16.0.0 20250417 (experimental). The fix is only in the run-time library, not in the compiler proper. So it doesn’t catch the problem at compile time by default - just at run-time…)

Update #2: Some progress over the past few days in gfortran. The compiler will now report most cases of missing commas by default, and require -std=legacy to ignore them:

wspector · April 17, 2025, 3:49pm

Lfortran has had the same issue reported, but not fixed yet:

github.com/lfortran/lfortran

Nonstandard format allowed

opened 12:43AM - 17 Feb 25 UTC

harperjf

error not reported

F2023 constraint C1302 (R1303) says C1302 (R1303) The optional comma [in a form…at] shall not be omitted except • between a P edit descriptor and an immediately following F, E, EN, ES, EX, D, or G edit descriptor, possibly preceded by a repeat specification, • before a slash edit descriptor when the optional repeat specification does not appear (13.8.2), • after a slash edit descriptor, or • before or after a colon edit descriptor (13.8.3) This program violates that constraint by allowing the format ```(AF9.6)``` but Lfortran 0.45.0 --std=f23 compiles and runs it. So do gfortran, g95 and AMD flang. But ifx and ifort are standard-conforming by refusing to compile it. ``` ! Fortran requires comma between A and F in a format program badfmt implicit none character(40):: fmt = "(AF9.6)" print *,fmt print fmt, 'pi =',4*atan(1.0) end program badfmt ```

Topic		Replies	Views
Why are parens required around format strings? Language enhancement	68	1466	July 24, 2024
Detect formatting errors in write()	9	287	August 5, 2024
Undetected format error	34	533	June 10, 2025
Variable repeat factor / Variable Format Expressions Help	17	1566	October 28, 2022
What is the one-size-fits-all format for numeric number output?	25	1247	June 26, 2022

Test for valid format strings

Related topics