Why are parens required around format strings?

While reading a thread about the G format descriptor, I found this comment:

Can someone explain why this would be hard to change? Usually the concern is backward compatibility. But if the surrounding parentheses were made optional, wouldn’t all existing valid format statements remain valid?

Incidentally, what tag should I apply to a post like this, a general question about the language? It’s not a request for Help, not a Language Enhancement, etc.

3 Likes

It seems like a language enhancement to me. That’s a good idea, the extra parentheses indeed seems superfluous to me as well, I think about it every time I type it.

7 Likes

AFAIK the outermost parentheses are representing the looping behaviour. When you run out of formatters e.g., print "(i0)", 1, 2, it will print 1 in one line and then because there are no more formatters, the last pair of parentheses are repeated as long as needed, adding a newline and 2.

If the outer parentheses were optional and left out, would this mean, that if there are no other parentheses, this loop doesn’t happen?

Besides, you can concatenate format strings since additional parentheses inside doesn’t interfere, e.g.,

character(*), parameter :: fmt_i = "(i0)"
character(*), parameter :: fmt_g = "(g0)"
print fmt_i, 42
print fmt_g, 3.14
print "(" // fmt_i // ", x, " // fmt_g // ")", 42, 3.14

Admittetly, you have to manually add the outer parentheses, but IMHO this isn’t that much of a problem. Most of the time you have to concatenate something in between anyway.

4 Likes

Adding the outer parentheses to allow concatenation did occur to me, but concatenation wasn’t really my interest in this question. I was more wondering why any parentheses are necessary in the first place. What would fundamentally prevent a compiler from performing the looping behavior if there were no parentheses, say, by adding implied parentheses to the whole string? To me the outer parens just seem like visual clutter, and one more opportunity for errors. And since it’s a format string, it’s often annoyingly a runtime error instead of compile time.

IMHO Fortran already has too much “implied/implicit behaviour”. I would therefore be in favour of looping format strings only if the parentheses are present.
A decision must be made whether to truncate the output or raise an error (compiler or runtime) if there are more output values than formatters.

@certik how costly would it be to prototype this in LFortran?

1 Like

Didn’t the IBM G and H Fortran 66 compilers have format strings programmable at runtime? The string was in an integer array, as I recall. Then the final paren was the only indication the runtime code had for the end of the format string.

Yes. In Fortran 66, the format in a read/write statement could either be a statement label (with some compilers could also be an ASSIGNed statement label) or an array. With an array you would use Hollerith constants to do something like:

  INTEGER IFMT(5)
  DATA IFMT /4H(12H,4HHELL, 4HO WO, 4HRLD!, 1H)/
  ...
  WRITE (6, IFMT)

So yes, the parenthesis at the beginning and end helped the run-time library identify the bounds in memory of the format spec.

1 Like

This will only work if there is no looping over the format statement. If there is looping, then it will be the last of those inner format strings that will get looped over rather than the whole format string being looped over. To give a simple example, the write statement

write(*,'(a,(2i2))') 'a=', a(1:5)

would result in

 a= 1 2
 3 4
 5

because it is the inner (2i2) that is looped over rather than the outer (). So sometimes the parentheses do matter.

One could also read that array from a file, and if so, then the outer parentheses were required then too. Another way to specify the array on many compilers was with ENCODE/DECODE statements, which were replaced in f77 with standard internal read/write statements.

Although there were a number of things Fortran 77 should have had but didn’t, character data type alone, and associated deletion of all the Hollerith madness, made for a massive improvement of Fortran 77 over Fortran 66.

I guess it was the same with the unix guys and doing C, after finding B (which like BCPL only supported integers and addresses) inadequate.

You’re right, those parentheses do matter!
But it doesn’t make a big difference to wrap the format string in additional outer parentheses:

print "((" // fmt_str // "))", arr

The parentheses are needed, of course, in the FORMAT statement. As has been discussed, they were there to prevent running off the end of a format contained in an array. While I could imagine adding the option to not provide the parens in an I/O control list, that would need some tweaking of the rules about format recursion (the “looping” mentioned here - see Doctor Fortran in “Revert! Revert! The End (of the format) is Nigh!” - Doctor Fortran (stevelionel.com))

My view is that this change would be more trouble than it is worth. It does not add any functionality and is therefore “syntactic sugar”, something we generally try to avoid, unless the benefit seems clearer than it does here.

1 Like

I had forgotten about this, and it does seem to be a complication to simply removing the enclosing parens as a requirement.

Thanks to everyone for the replies.

Whenever any sane proposal to make language more logical and consistent appears, and does not break the backwards compatibility, it is called “syntactic sugar” for lack of a better argument. Fortran is a syntactic sugar over assembly if you think about it.

If parens are always there they mean nothing, very simple.

2 Likes

Indeed. Kind of like “implicit none”, it’s always there (in modern codes and any code I ever wrote), so it becomes something you get used to just typing all the time. Until I realized that it means nothing, because it is always there (exactly as you said), so I stopped writing it and made it the default. The “()” format is just like that too, and many other syntactic sugar quality of life improvements.

We can experiment with all these things later in a compiler. There is always a cost by adding a new feature that it must be weighed against: Cost of adding (any) new feature to the Fortran language.

Another one to consider is print *, "Hello", where I always have to type *,, and possibly using the Python 3 style print("Hello") would be awesome, if it is possible. Update: I created an issue for this at Implement simpler print syntax: `print("Hello World!")` · Issue #137 · j3-fortran/generics · GitHub.

It does not change the issue with parens discussed here, but the article gives short shrift to non-advancing I/O and does not mention the issues that formatted line length limits cause in creating unexpected line breaks albeit that is a less common problem now-adays where line limits are often huge.
r
Using a loop and non-advancing I/O is perhaps verbose, but it is a very flexible alternative.

program testit
implicit none
real :: arr(30)
integer :: i
integer :: items
arr=[(sqrt(real(i)),i=1,size(arr))]

write(*,*)"using advance='no' and a loop"
do i=1,size(arr)
   write(*,'(2x,f8.2)',advance='no')arr(i)
enddo
write(*,*)

! Agree star is better now; but big values used to be used. It was not
! clear what the max could be but generally could still get
! multiple lines when current line length was hit. Still true,
! as non-advancing I/O is not true stream I/O.
write(*,*)"using a big count"
write(*,'(2147483647(2x,f8.2))')arr

write(*,*)"sort of like writing format on the fly or VFE"
items=10   ! just for fun, allow changing number of items on line
do i=1,size(arr)
   write(*,'(2x,f8.2)',advance=merge('yes','no ',modulo(i,items).eq.0))arr(i)
enddo

!write(*,*)"implied loop not the same"
! subtle, but note this is not the equivalent of the above loop
!write(*,'(2x,f8.2)',advance='no')(arr(i),i=1,size(arr))

! sometimes list-directed is close enough
write(*,*)arr

end program testit

since Fortran lacks a standard way of declaring stdout to be stream I/O it only applies to named output files but that is of course a more robust way to write an unlimited number of values without line breaks and is a bit overkill for most cases but does avoid the issues with formatted line length limits

More along the lines of the original question I use character variables to hold formats very frequently and it bothered me enough that I made a function to add the parentheses but it was not very satisfying so I pretty much have quit using it but it basically just returned ‘(’//string//‘)’. Seemed like a nice idea at first but was really about as much work to use as adding the parenthesis in the first place.

Again, the language didn’t originally have a character data type. So there otherwise wasn’t a way for the formatted I/O run-time library to know where the end of the format was in memory. The easiest and most compatible thing when Fortran 77 came out was to keep the format strings the way they had always been - with parenthesis. Though now that character strings have a length, it does seem plausible that modifications could be made to allow the parentheses to be elided.

Another related quirk in pre-Fortran 77 was that Hollerith constants were allowed as actual arguments in procedure calls, as in:

CALL OUTPUT (12HHELLO WORLD!)

If you were writing the procedure OUTPUT, how would you know the number of characters to process? No way of telling. So there would typically be an additional argument where you’d pass the number 12. (Though a few compilers would secretly add a word of binary zeros after the Hollerith string in memory, so that the procedure could look for zero word. It was kinda like the C convention where a null byte terminates a string in an array of chars.)

The funniest situation with Fortran is when an unsuspecting practitioner tries to post a reproducer and has the USELESS implicit none all over the place, as if a badge of honor!!

Or, you can try the even more verbose, write( *, '("Hello")' )!!

This is actually a great example of why it would be nice to find a way to avoid the enclosing parens. Because they’re required, any format string containing literal text must always be specified using at least three levels of nested enclosing characters, including () and both kinds of quotes: ' ', " ". And if it’s explicit in the write statement as in your example, it necessarily involves two layers of (), for a fun total of four nested groupings.

2 Likes

My wish-list for PRINT is simply that if no format specifier is present, which is currently not accepted, that it use ‘(*(g0,1x))’. That should be upward-compatible and allow “print,‘Hello World!’” to work relatively intuitively.

In more detail …

Given that the compiler has a lot of options for how it outputs
list-directed output (ie. fmt=*) I would like the format specifier
to be optional, and when not present to act as if ‘(*(g0,1x))’ was
specified and DELIM=‘NONE’ was in effect. It would start in
column one and always have a single space between arguments even
if strings :

program testit
use iso_fortran_env, only : OUTPUT_UNIT
character(len=*),parameter :: all='(*(g0,1x))'

   !print,"Hello World!" <== wish this worked like next line
   ! start in column 1, consistent single space separator
   print all,'Hello','World!',10,20,30

   ! REMEMBER:
   ! list-directed I/O starts in column 2 and it is up to
   ! the compiler how numeric values are separated -- one space
   ! or a field width big enough for biggest value
   ! of that type so nice columns are easy to print ... up to
   ! compiler

   call samestatements()

   ! like a box full of cherries ...
   open(unit=output_unit,decimal='comma') ! comma!point
   open(unit=output_unit,delim='quote')   ! APOSTROPHE!QUOTE!NONE

   ! rinse and repeat ...
   call samestatements()

contains
subroutine samestatements()
   print *,'Hello','World!',10,20,3000
   print *,'Hello','World!',1000,300,50
end subroutine samestatements

end program testit


Hello World! 10 20 30
 HelloWorld!          10          20        3000
 HelloWorld!        1000         300          50
 "Hello" "World!"          10          20        3000
 "Hello" "World!"        1000         300          50

The output from the SAMESTATEMENTS() procedure can vary a lot from programming environment to programming environment. The change I proposed would produce something more predictable an allow for printing in column one without having to create a format. I have a module of formats just for that reason with strings like the above “all” format but “all” if by far the most common one I use so it would be great if it was the default.

Could you put a colon in the middle to prevent a superfluous trailing space?

‘(*(g0,:,1x))’

If my understanding is right, that would terminate after g0 once it runs out of list items.