Tab formatting with stream access

aradi · March 29, 2023, 5:11pm

Does anybody know, how tabulator formatting is supposed to work when writing formatted files with stream access? Using the following trivial program, I get different file contents with different compilers.

program tabformat
  implicit none

  integer :: fd

  open(newunit=fd, file="test.stream.txt", access="stream", form="formatted")
  write(fd, "(a)") "1234567890123"
  write(fd, "(a, t10, a)") "1234", "0123"
  close(fd)

end program tabformat

Two compilers give the same output as obtained with sequential access (with all three compilers):

1234567890123
1234     0123

but for the stream access, the third one writes

1234567890123
1234         0123

instead.

Is that a compiler bug? Or is the tab-formatting simply ill-defined (processor dependent) for stream I/O, as it refers to positions in the current record, but the stream-access is not record based?

FedericoPerini · March 29, 2023, 6:00pm

The way I use the Tx descriptor with gfortran is “set cursor position to x”, which conforms to your upper example. There’s a nice summary here. Is it possible that, with stream access, that is illegal? it seems like one compiler adds 4 spaces, like it rewinded the line before moving 10 columns forward. The rewind operation would be lost in the stream access I guess.

msz59 · March 29, 2023, 6:15pm

This seems to be standard-conforming. It says (emphasis mine):

13.8.1.2 T, TL, and TR editing

1 The left tab limit affects file positioning by the T and TL edit descriptors. Immediately prior to nonchild data transfer (12.6.4.8.3), the left tab limit becomes defined as the character position of the current record or the current position of the stream file. If, during data transfer, the file is positioned to another record, the left tab limit becomes defined as character position one of that record.

2 The Tn edit descriptor indicates that the transmission of the next character to or from a record is to occur at the nth character position of the record, relative to the left tab limit. This position can be in either direction from the current position.

So, apparently, in a stream file Tn descriptor should always work from the current position

aradi · March 30, 2023, 10:40am

Thanks a lot for pointing out the right section in the standard!

But then, I am wondering, what counts as current position in a stream file? Is it the position the file had when the execution of the write statements started, or is this position also updated during the read?

13.7.1.1 seems to discuss mention this (emphasis mine):

Data edit descriptors cause the conversion of data to or from its internal representation; during formatted stream
output, the A data edit descriptor may also cause file positioning.

Do I understand it correctly, that in the case of the A-descriptor it is processor dependent, what is the left tab limit, but with all other descriptors the position may not be updated during the write statement, but only when it had been finished?

I’ve modified my program as:

program tabformat
  implicit none

  integer :: fd

  open(newunit=fd, file="test.stream.txt", access="stream", form="formatted")
  write(fd, "(a)") "1234567890123"
  write(fd, "(a, t10, a)") "1234", "0123"
  write(fd, "(i4, t10, i4.4)") 1234, 0123
  close(fd)

end program tabformat

and obtained with compilers 1&2:

1234567890123
1234     0123
1234     0123

while compiler 3 produces:

1234567890123
1234         0123
1234         0123

If my interpretation is correct, compiler 1&2 would be the standard conforming ones, as they do not update the position for the I-descriptor (and opt for also not to update it for the A descriptor). Compiler 3 would then be non-standard conforming, as it seems to updated the position also for the I-descriptor.

Any views on this?

urbanjost · March 30, 2023, 9:59pm

Before non-advancing I/O and stream I/O this was a bit better defined and still different with each compiler. In the original case with record I/O there could still be issues because the T field was allowed to move to the left as well as the right. Typically, this mean an internal buffer was used so you could always get back to column 1. Once non-advancing I/O was introduced this could not remain the meaning as your lines could become gigantic, and buffering becomes much more likely to be a problem (although giant I/O lists could cause problems with a regular old WRITE as well).

Some vendors did buffer everything, which caused all kinds of problems with the portability of codes sending escape sequences to a screen and many other problems; but to be resolvable the T numbers had to be specific to a write statement, not to the construction of a line. Then when true stream I/O was supported which may never generate an end-of-record but most would assume could generate something like unbuffered I/O a third definition emerged.

I personally do not use T descriptors with stream I/O. Whether the standard defines it strictly or allows for differences intentionally I have not found two compilers that agree. If you really need T-editing write into an internal file using regular sequential I/O, and once the string is completed output it. WIthout a strong reason I would do the same with non-advancing I/O.

If you think you found two compilers that are the same try ending a format in T1 and see what the following I/O does, then end one with 10X and try the same, etc. The odds are you will get
multiple results with different compilers.

The T format now means different things depending on the attributes of the open file which for me is just too confusing. That I have a format statement labeled or stored in a CHARACTER variable that generates different output writing out the same values seems to be an issue not worth trying to sort out, as the compilers certainly have not.

So my rule (which I am sure I have broken a few times here and there) is – pretend like T descriptors are not allowed in non-advancing and stream I/O unless you know your code will only be used with one compiler for a relatively short time.

aradi · March 31, 2023, 10:36am

Yes, this is exactly what I have reverted to at the end.

aradi · March 31, 2023, 10:59am

Just to summarize: Based on the discussions above, I came to the conclusion, that file positioning during a formatted write statement using stream access is only allowed after an A-descriptor. 13.7.1.1 and 13.8.1.2 of the Fortran 2018 standard seem to be the relevant sections.

Consequently, the program

program tabformat
  implicit none

  integer :: fd

  open(newunit=fd, file="test.stream.txt", access="stream", form="formatted")
  write(fd, "(a)") "1234567890123"
  write(fd, "(a, t10, a)") "1234", "0123"
  write(fd, "(i4, t10, i4.4)") 1234, 0123
  close(fd)

end program tabformat

should either produce (if no file positioning takes place after the A-descriptor)

1234567890123
1234     0123
1234     0123

or (if an optional file positioning takes place after the A-descriptor)

1234567890123
1234         0123
1234     0123

So, compiler 1 & 2 are standard conforming, while compiler 3 is not (bug report had been submitted). And because the output might be processor dependent, one should probably refrain from using a T-descriptor after an A-descriptor when using stream I/O.

Thanks for all the useful comments.

JohnCampbell · April 2, 2023, 2:42am

I am not familiar with the use of " access=‘stream’, form=‘formatted’ ", but my understanding is, stream access does not define a record, but a format statement does.

What does " access=‘stream’, form=‘formatted’ imply, as different from " access=‘stream’, form=‘unformatted’ ?
It expect ‘formatted’ should provide a formatted character buffer to be generated with a format statement, with a resulting length, then to be sent to the stream file,
rather than ‘unformatted’ that sends a sequence of bytes. ( bytes !!)

Consider :
write(fd, “(a, t10, a)”) “12345”, “01234”
vs
write(fd) 12345, 0, 01234
15 byte buffer vs 12 bytes sent to the file.

“t10” is associated with the generation of the formatted character buffer.

Why would any standard suggest any different ?
Do the 3 compilers support the same Fortran standard ?

Fortunately, I don’t have a copy of the F2018 standard to be confused !!

aradi · April 3, 2023, 9:00am

I agree with you, that this makes most sense. However, the F2018 standard apparently allows that t10 (optionally) refers to the position achieved after the last A formatted output within the write statement. So, even standard conforming processors could yield different results, if the A descriptor was used in formatted stream output… Therefore, one should probably generally avoid this combination…

RonShepard · April 3, 2023, 5:03pm

Could you point out where this is described? Section 13.8.1.2 that described the various T edit descriptors does not mention anything exceptional about A formatting.

aradi · April 8, 2023, 11:06am

@RonShepard Indeed, the information is somewhat scattered. While 13.8.1.2.1 states

Immediately prior to nonchild data transfer (12.6.4.8.3), the left tab limit becomes defined as […] the current position of the stream file.

in 13.7.1.1 one finds

Data edit descriptors cause the conversion of data to or from its internal representation; during formatted stream
output, the A data edit descriptor may also cause file positioning

Not sure, whether my interpretation is correct, but I concluded from latter, that the current position of the stream file might change during output if an A descriptor is present and then the left tab limit with it.

RonShepard · April 8, 2023, 5:37pm

Ok, I think I see the problem now. My draft version of the 2018 standard does not have sections 13.8.1.2.1 or 13.7.1.1.

However, if I were to guess what was added, I think it is to cover the case where a short string is written into a wide Aw field. Say you write ‘1234’ into an A10 field. In that case, the position within the record is not after the four characters that were transferred, but rather it is after the 10-character field width.

Is that a good guess?

aradi · April 9, 2023, 8:28am

Sorry, I’ve somewhat abbreviated, I meant 13.8.1.2 §1 and 13.7.1 §1 in the Fortran 2018 draft.

My guess would be rather, that this was courtesy towards legacy compilers, which could not handle arbitrary/dynamic output buffer sizes. When an A descriptor is used without a specified width, you can not determine the necessary I/O buffer size at compile time. All the other descriptors (including Aw) used in nonchild data transfer have mandatory width parameters, so the buffer size in theory can be determined at compile time. (Of course, in the mean time, probably all compilers implemented dynamic buffer sizes in order to handle user defined formatting via the dt descriptor, but the exception with the A descriptor apparently remained in the standard nevertheless…)

RonShepard · April 9, 2023, 5:39pm

While this is true for character data, it is also true for other data types. For example, the compiler cannot know at compile time what is the position within the buffer after an item like a(1:n) is written when n is a variable. That is, a character string whose length is unknown at compile time has the same feature as an array of some other type whose length is unknown at compile time.

aradi · April 9, 2023, 6:51pm

If you make formatted output of your array, such as

write(*, "(f12.4, t40, f12.4)") a(1:n)

the compiler knows already at compile time, that it is enough to allocate an I/O buffer of 51 characters, which then repeatedly will be filled up and flushed (latter causing repositioning of the file) n / 2 times. While for

write(*, "(a, t40, a)") trim(string1), trim(string2)

the necessary buffer size depends on the actual trimmed lengths of string1 and string2 at runtime.

But of course, the reason for the special role of the A descriptor might be completely different from the one I have guessed. What matters is that (at least in my interpretation) the standard allows for differences in the formatted stream output when T and A are used together, so one should not use them, unless one is willing to accept compiler dependent output.

RonShepard · April 9, 2023, 11:00pm

aradi:

If you make formatted output of your array, such as
write(*, "(f12.4, t40, f12.4)") a(1:n)
the compiler knows already at compile time, that it is enough to allocate an I/O buffer of 51 characters, which then repeatedly will be filled up and flushed (latter causing repositioning of the file) n / 2 times.

What if the format were changed to

write(*, "(f0.4, t40, f0.4)") a(1:n)

or to any of the other i0, e0, or g0 type formats? Are those special-cased in the standard the same way that A is?

As an aside, it is not uncommon to make the mistake of writing a number like huge(x) with an f0.d format. It can be confusing at first to see that kind of output. I stumbled over this once using minval() with a zero-length array.

JohnCampbell · April 10, 2023, 3:26am

The size of the output buffer and the position for a T10 after an A are all issues that have been solved for a formatted file. I don’t see the issue for a formatted stream file, which requires the generated output buffer is appended to the stream file, but without any end of record information.
Are we sure this is requires, or was provided a different set of rules ?

RonShepard · April 11, 2023, 4:06pm

RonShepard:

What if the format were changed to
write(*, "(f0.4, t40, f0.4)") a(1:n)
or to any of the other i0, e0, or g0 type formats? Are those special-cased in the standard the same way that A is?

Although there are no replies to this question, I think the answer is no, there are no special exceptions for these formats.

Another possible reason that the A format field is treated differently is that the data that is transferred with an A field might contain a newline character. Does the transfer of a newline character also reset the T field limits in a stream file?

Consider the following example.

program tabformat
  implicit none

  integer :: fd

  open(newunit=fd, file="test.stream.txt", access="stream", form="formatted")
  write(fd, "(a)") "1234567890123"
  write(fd, "(a4, t10, a)") "12", "0123"
  write(fd, "(i4, t10, i4.4)") 1234, 0123
  write(fd, "(a, t10, a)") "1234"//achar(10), "0123"
  close(fd)

end program tabformat

The output I see with gfortran is

1234567890123
  12         0123
1234         0123
1234
         0123

That second write statement shows that it is the A4 field width that set the following T10 position, not the two characters that were transferred. Then in that last write statement, the T10 descriptor moves to the 10th position within the new record, not relative to the position within the current record at the beginning of the write statement.

I think that is the expected behavior. Is that correct?

With ifort, I see

1234567890123
  12     0123
1234     0123
1234
    0123

Is that last line correct in this case?

RonShepard · April 11, 2023, 4:22pm

Oops, you replied before I added the ifort output to my post. It seems to me like gfortran gets the second and third wrong, while ifort gets the fourth write statement wrong.

aradi · April 11, 2023, 6:30pm

RonShepard:

What if the format were changed to
write(*, "(f0.4, t40, f0.4)") a(1:n)
or to any of the other i0, e0, or g0 type formats? Are those special-cased in the standard the same way that A is?

As far as I can see, there are no exceptions for those descriptors. But IMO, despite the 0-width specifier, it is still possible to guess the maximal length of the necessary buffer at compile time in the case above, as the compiler should be able to determine the maximum nr. of digits needed to represent a given numerical data type in the output…

Topic		Replies	Views
Understanding Fortran file I/O	13	4754	February 22, 2022
Document the format of an unformatted stream file	5	709	January 12, 2021
Stream I/O usage Help	5	677	November 2, 2022
Formatted stream read Help	26	1287	January 28, 2024
Questions about print Help	1	291	June 27, 2022

Tab formatting with stream access

Related topics