Tab formatting with stream access

aradi · April 11, 2023, 6:46pm

I’d argue, that the last line of the Intel output is absolutely correct. The last write creates following hexdump content:

00000000  31 32 33 34 0a 20 20 20  20 30 31 32 33 0a        |1234.    0123.|

The character 0 (0x30) is exact at the 10th position, as you would expect with the t10 descriptor. (You might of course get the result visualized on your terminal in a confusing way, due to how your terminal visualizes the fifth character (achar(10) / 0x0a), but that’s not the compilers fault…)

RonShepard · April 11, 2023, 7:01pm

Yes, but for an f0.d field, that maximum width is over 300 digits for each number. Even in the case of an A field, the compiler has some maximum number of characters that it can store in a string, so, in principle, it has the same kind of maximum field width available there too.

RonShepard · April 11, 2023, 7:13pm

Ok, so for ifort, the T10 is counting from the file position value that was at the beginning of the write statement. Gfortran is counting from the new file position at the beginning of the new record.

I just checked to see, and if that write statement is changed to

write(fd, "(a, t10, a)") "1234"//achar(10)//achar(10)//achar(10), "0123"

then gfortran puts the last “0123” string in the same position within the line, just with the two extra blank lines inserted.

I don’t have access to ifort right now, so I can’t check what ifort does, but if your conjecture is right, then ifort should continue to count from the beginning position at the write statement, and the last “0123” string should appear two positions to the left of what it did with just a single lf character.

I do have access to nagfor now, and it does what we think ifort would do with this. That is, the last “0123” string begins in column 3.

I’m not sure what you mean about being confused about the terminal output. If that is the way ifort (and nagfor) is counting file positions for the T10, then that is what I would expect.

Are all three compilers correct? Does the standard allow the file position to be counted either way? And if gfortran is wrong, then is it also wrong for the second and third write statements, or is it just the third statement that is wrong?

[edit] Just out of curiosity, I checked to see what happens with normal formatted files. Both gfortran and nagfor output the same thing. So nagfor output looks the same with either open statement, while gfortran is different for stream than for a normal formatted file. I’m still curious if the fortran standard allows that difference, or if this is a bug.

aradi · April 11, 2023, 7:38pm

RonShepard:

I just checked to see, and if that write statement is changed to
write(fd, "(a, t10, a)") "1234"//achar(10)//achar(10)//achar(10), "0123"
then gfortran puts the last “0123” string in the same position within the line, just with the two extra blank lines inserted.

I don’t have access to ifort right now, so I can’t check what ifort does, but if your conjecture is right, then ifort should continue to count from the beginning position at the write statement, and the last “0123” string should appear two positions to the left of what it did with just a single lf character.

Yes, ifort is consistent with the previous assumptions, as it delivers

00000000  31 32 33 34 0a 0a 0a 20  20 30 31 32 33 0a        |1234...  0123.|

I think, gfortran is right for the 2nd and the 4th lines, but only due to the exception for the A-descriptor in the standard, which allows for file repositioning. On the other hand, it is wrong for the 3rd line IMO, as repositioning is not allowed with the I descriptors. (I’ve reported the issue last week and is being investigated…)

RonShepard · April 12, 2023, 6:44am

Here is something else that might be unexpected. Change the line to

write(fd, "(a, t10, a)") "1234"//repeat(achar(10),5), "0123"

With ifort, there are four blank lines and the last “0123” is written starting at column 1. Then if you change the repeat count to anything between 5 and 9, you will get that same output, four blank lines and then the string written starting at column 1. The T10 is repositioning the file pointer back to within the previous records and overwriting them with the “0123” string. If the repeat count is larger than 9, then some newline characters are still in the buffer, and they end up being blank lines after the “0123” string.

In contrast, gfortran writes out all the blank lines, and the last “0123” string always starts at column 10.

[edit] I found this text in section 13.7.4 of the f2018 draft (I added the bold):

“If the file is connected for stream access, the output may be split across more than one record if it contains newline characters. A newline character is a nonblank character returned by the intrinsic function NEW_LINE. Beginning with the first character of the output field, each character that is not a newline is written to the current record in successive positions; each newline character causes file positioning at that point as if by slash editing (the current record is terminated at that point, a new empty record is created following the current record, this new record becomes the last and current record of the file, and the file is positioned at the beginning of this new record).”

My example above hard coded the achar(10) character, so it is not required to conform to this paragraph, but I compared the output of new_line() and it is indeed the same as achar(10) for gfortran and ifort, so in that roundabout way, that is supposed to be the behavior. That suggests that gfortran is behaving correctly in these cases with embedded nl characters and that ifort is in error. Here is an updated version of the code that should conform on all compilers:

program tabformat
  implicit none

  integer :: fd
  character, parameter :: nl = new_line(nl)

  open(newunit=fd, file="test.stream.txt", access="stream", form="formatted")
  write(fd, "(a)") "1234567890123"
  write(fd, "(a4, t10, a)") "12", "0123"
  write(fd, "(i4, t10, i4.4)") 1234, 0123
  write(fd, "(a, t10, a)") "1234"//repeat(nl,10), "0123"
  close(fd)

end program tabformat

I’m running this on MacOS, which is posix compliant, so the records in a formatted file are indeed separated by single achar(10) characters. It would be interesting to see what happens in the output file on a nonposix machine like Windows. On Windows, I think there should be two characters (a cr and a lf) written to the file for each achar(10) character transferred. Can a Windows programmer confirm that that happens?

[edit 2] I have now tested nagfor, and it does the same thing as ifort with embedded newline characters. So if gfortran in correct, then both ifort and nagfor are wrong in this situation.

FortranFan · April 12, 2023, 6:54pm

@aradi, this thread just caught my attention. I think the wording with “only allowed” in your conclusion, “file positioning during a formatted write statement using stream access is only allowed after an A-descriptor,” is rather strong. Paragraph 1 in Section 13.7.1 in the standard is interesting in the sense it supports the processor-dependent nature to the behavior you notice. If you remain interested, I suggest you work with someone like @sblionel to get further clarity on this from the Interp subgroup on the J3 committee.

Separately, with that 3rd compiler where you notice differences, you may want to check if you remain interested in this in that I doubt it is restricted to A descriptor. I suspect that compiler implementation follows T positioning differently and does the left tab limit relative to each data transfer. Given the code in your original post, you can retry with G0 with the “generic”, variable-length descriptor or fixed-length A4 and how the compiler processes it.

RonShepard · April 12, 2023, 8:49pm

The clause “the A data edit descriptor may also cause file positioning” has two meanings in English.

One interpretation is that it gives permission for file positioning to occur, perhaps in a processor dependent way. That has been the interpretation in most of this discussion.

The other interpretation is that the A descriptor might or might not cause file positioning based on the data that is transferred. This latter meaning does not require, or necessarily allow, any processor dependence, it only implies data dependence.

I have suggested two different ways that latter interpretation might occur. One is when a short text string is written into a long Aw field. After some discussion, I now am unsure whether that applies. The other way is when the newline character appears in the character data that is being transferred. As described in section 13.7.4 (of the f2018 draft), this does occur. Namely, each newline character is required to reset the file position, in which case, the T field must then refer to that new file position, not to any previous one even within the same write statement.

[edit] I have now tested nagfor, and it does the same thing as ifort with embedded newline characters. So if gfortran in correct, then both ifort and nagfor are wrong in this situation.

FortranFan · April 12, 2023, 9:15pm

Agree re: the two interpretations. But the question is what interpretation does that 3rd compiler use and whether those compiler implementors are willing to accept any other take on the standard verbiage. That’s why I suggested to OP to follow-up with J3 committee via @sblionel if there is continued interest in this.

Good “data point” re: NAG compiler.

aradi · April 12, 2023, 10:03pm

OK, I did not consider this meaning of “may”. Actually, I’ve got an other response (which I don’t see above for some strange reasons), which also supports this interpretation, so this might be a / the commonly accepted interpretation.

And yes, I agree, according to 13.7.4, nagfor and ifort seem to be wrong, when the character variable before the t descriptor contains achar(10) / new_line('x') characters (provided “may” in 13.7.4 has the meaning of “processors are expected to do so”).

aradi · April 12, 2023, 10:14pm

OK, I’ve tried to summarize the discussion in three questions, which we may/might discuss with the standard committee as suggested by @FortranFan.

Given a file unit u opened for formatted stream output, what is the supposed outcome of the following write statement? (Provided new_line("x") == achar(10))

write(u, "(a, t10, a)") "1234" // repeat(new_line("x"), 5) // "5678"
! 1234\n\n\n\n\n5678 or 1234\n\n\n\n\n.........5678
! where \n and . indicate the newline and the whitespace characters, respectively.

13.7.4 (§5) states, that

If the file is connected for stream access, the output may be split across more than one record if it contains newline characters.

Does the standard allows here processor dependent behavior, or are all processors expected to split the output into multiple records when newline characters occur?
13.7.1 (§1) states, that

Data edit descriptors cause the conversion of data to or from its internal representation; during formatted stream output, the A data edit descriptor may also cause file positioning.

Does the standard allow here processor dependent behavior, or are all processors expected to behave the same way (and position the file only if the character variable/expression corresponding to the A-descriptor contains one or more newline characters)

@sblionel your judgement would be very appreciated here.

sblionel · April 13, 2023, 12:06am

In the following, my quotes from the standard are from F2018. F2023 has not changed the text in this area.

First, regarding “may”, the ISO House Style document says:

To ensure that a document is understood and applied correctly, use “may” to express a permission and “can” to express a possibility or capability. Avoid substituting either of these terms with “might” or “could”, even if this seems logical in English. Revise a sentence that uses “might” or “could” to avoid confusion and misapplication of the text.

So, where the standard says, “If the file is connected for stream access, the output may be split across more than one record if it contains newline characters.”, this is giving permission. Note that the standard also says (in a note, so this is non-normative), “If the intrinsic function NEW_LINE returns a blank character for a particular character kind, then the processor does not support using a character of that kind to cause record termination in a formatted stream file.” (13.7.4)

My interpretation of this is that if NEW_LINE returns a character other than blank that transmitting that character using A format to a file connected for stream access does indeed start a new record.

Beginning with the first character of the output field, each character that is not a newline is written to the current record in successive positions; each newline character causes file positioning at that point as if by slash editing (the current record is terminated at that point, a new empty record is created following the current record, this new record becomes the last and current record of the file, and the file is positioned at the beginning of this new record). (13.7.4)

This to me is unambiguous, and in @aradi’s first example, each newline starts a new record, resetting the “left tab limit”, so the correct output is 1234\n\n\n\n\n.........5678

Immediately prior to nonchild data transfer (12.6.4.8.3), the left tab limit becomes defined as the character position of the current record or the current position of the stream file. If, during data transfer, the file is positioned to another record, the left tab limit becomes defined as character position one of that record. (13.8.1.2)

I will check my understanding with others on the committee, but my opinion is that the standard requires a newline character (if not blank) to reset the left tab limit. I’ll also ask if the use of “may” here is appropriate.

sblionel · April 13, 2023, 2:34pm

I have confirmation that my analysis is correct, and that a newline character should act the same as a / format edit descriptor, resetting the left tab limit. I’m still waiting for a comment on the use of “may” here, which I think should be “can” instead.

sblionel · April 14, 2023, 12:46am

Malcolm Cohen, the standards editor, replied to my question about “may”:

Yes, “can” would be better. The i/o clauses are full of “may” in situations where giving permission kind of makes sense if you think about it a certain way, but is unnecessary. E.g. “The G edit descriptor also may be used to edit logical data”. These ought to be changed to “can” sometime, but I think it is not actually wrong to give permission, just unnecessary, so that’s why I didn’t already change them.

I may (!) write an Edit paper about this for the next revision “202Y”.

JohnCampbell · April 14, 2023, 2:09am

It is probably important to clarify the wording of the standard, but I am puzzled by the problem of using new_line(“x”) associated with a stream file.
A stream file does not have a record structure. If the user wants to impose, ie define some form of record structure in a stream file, then surely that can be done however they want.
If the Fortran record structure is so important, don’t use stream access.

I go back to my previous point: The format statement creates a character buffer, which is then sent to the stream file, without any end of record syntax.
Why impose special formatting rules for a stream file ?
Why embed new_line characters in the buffer being sent to a stream file and then expect the standard should impose some record structure rules on a stream file ?
A stream file should be a blank canvas for the user to do as they wish.

RonShepard · April 14, 2023, 2:36am

I think in this case, it is the operating system and filesystem combined that define the record structure. The fortran stream access is one way to access those files.

Did you see my previous question about the file contents on a Windows machine? I think when you write a newline <nl> to a stream file on a windows machine, it gets converted into two characters, a carriage return <cr> and a <nl> pair. I think this is also what happens with i/o in c – writing a '\n' character gets converted into that character pair.

A related question is what happens when you read a file with that pair of characters. Is the fortran library supposed to convert the <cr><nl> into just a <nl>?

JohnCampbell · April 14, 2023, 3:27am

Again, I would say that as the file is opened for stream access, there should be no record structure conversion, so the characters should be transferred to the buffer as-is ;
This is certainly the case of open (unit=lu, file=“test.stream.txt”, access=“stream”)

But what if open (unit=lu, file=“test.stream.txt”, access=“stream”, form=“formatted”) ?
What is the purpose of access=“stream”, form=“formatted” ?

aradi · April 14, 2023, 6:32am

For us, the issue came up, because we wanted to write formatted text output without record length limitations. As the sizes of the text blocks in our formatted output can be quite big and their maximal size is not known when the output file is opened, formatted stream output seemed to be an appealingly simple solution to avoid run-time errors due to record-length limits.

But, then, at other places, when using T descriptors we hit the problem of the different output by different compilers, which started this thread and the discussion about what exactly the standard says about stream output. But I tend to agree with you, that probably formatted stream output is not the right choice, if the output should be formatted but contains newlines/record markers itself.

sblionel · April 14, 2023, 6:01pm

It does (or can) in Fortran.

While connected for formatted stream access, an external file has the following properties.
• Some file storage units of the file can contain record markers; this imposes a record structure on the file in addition to its stream structure. There might or might not be a record marker at the end of the file. If there is no record marker at the end of the file, the final record is incomplete.

RonShepard · April 14, 2023, 6:51pm

I have not used this combination previously, so this discussion has been enlightening to me. There are a few features about these files that I did not know or that I thought were different. It’s always good to learn new things.

As to your question, one feature of these files is that you can overwrite data in the middle of the file without needing to read the prior records or to rewrite the subsequent records, as would be necessary for a normal sequential formatted file. It is character addressable, so you can open the file, position it wherever you want, make whatever changes you want, and then close the file. That seems like a useful and efficient feature. Fortran does not allow opening a formatted file as unformatted (or visa versa), so once the formatted file has been created, it should be opened thereafter as a formatted file.

sblionel · April 14, 2023, 7:24pm

It sometimes helps to have exposure to a broader set of platforms outside the currently common UNIX-style. There are platforms, VMS for example, where a formatted file can have a definite record structure without “newline” markers. (VMS added three kinds of stream-format files back in the 1990s, with different newlines.) A Fortran sequential access file can be positioned by record only, and even that is limited to BACKSPACE and REWIND. Stream files can be positioned by “file storage unit”, typically a byte.

In Fortran, stream files with a record structure can be built up or read character by character, arbitrarily positioned, and you can detect an end-of-record with EOR=.

I don’t think it’s correct to say that “Fortran does not allow opening a formatted file as unformatted (or vice versa)”, but the effect of doing so is implementation dependent. You’re most likely to have success with this using stream access. The standard doesn’t allow you to mix within a file, however.

Topic		Replies	Views
Understanding Fortran file I/O	13	4700	February 22, 2022
Document the format of an unformatted stream file	5	707	January 12, 2021
Stream I/O usage Help	5	675	November 2, 2022
Formatted stream read Help	26	1283	January 28, 2024
Questions about print Help	1	291	June 27, 2022

Tab formatting with stream access

Related topics