Formatted stream read

I have read the Standard, a topic on this discourse and the FortranWiki page cited there but frankly speaking, I don’t quite get the way formatted stream file reading is implemented in Fortran.

Suppose I would like to read integer values from a text file, with variable number of values in each line. Say,

1 2 3 4
1213 3456

I’d say this is the very idea of having stream access! Well, not quite :frowning: . The Wiki page says at the very beginning of Formatted Stream Files section:

Formatted stream I/O is not all that useful; I suspect they were provided to give Fortran essentially the same I/O facilities as C.

Sorry but they do not give anything close to C. Simplest possible C program:

#include <stdio.h>
int main(int argc, char *argv[])
{
  int m, res;
  FILE *fp=fopen("test.txt","r");
  while (1) {
    res=fscanf(fp,"%d",&m);
    if (res == EOF) break;
    printf("%d\n", m);
  }
  return 0;
}

happily reads all the values:

$ gcc test_read.c && ./a.out
1
2
3
4
1213
3456

So let’s try Fortran:

program t
implicit none
character(1) :: ch
integer :: iu, m
open(newunit=iu, file='test.txt', access='stream', form='formatted')
do
  read(iu, *, end=333) m
  print *,m
enddo
333 continue
end program t
$ gfortran read_stream.f90 && ./a.out
         1
      1213

Ooops. If the file has no end-of-record markers, gfortran-compiled program does not read anything! ifx-compiled executable - still just the first value.

To me, it does not make any sense and is IMHO at odds with the very first statement of the section 12.3.3.4 Stream access of F2018 standard:

1 Stream access is a method of accessing the file storage units (12.3.5) of an external stream file.

Elsewhere (12.3.5) it is said:

It is recommended that the file storage unit be an 8-bit octet where this choice is practical.

so I guess it is not bad idea to understand stream access as byte-by-byte access.

Your text file format has been a problem in Fortran for a long time.

Fortran does not provide an automatic (syntax) way of reading your two lines of text and provide the values and count of values for the two lines of your example.

You can use list directed I/O, but you must know how many numbers are on each line and you can’t read a real number (eg 2.0) into an integer variable. It would have been a great extension to list directed read if you could ask to read the list of numbers and be told how many numbers were in the record.

And there is your problem; stream access does not have records !

Many Fortran programmers, who read data from text files have some form of a free-format library to address this, such as reading a .csv file with an unknown number of values per line. We basically read the line as a text string, parse the line for tokens and read each value. Many have done this multiple times, for different “free format” file syntax.

Then there is “Formatted stream I/O”. It is a bit confusing !
Stream I/O provides character/byte/number transfers but without the “end of record” descriptors of formatted I/O or the “record size” descriptors of unformatted sequential I/O.
The other key feature of stream I/O is it is byte addressable, which can be very effective for creating unformatted file structures.
Stream I/O is more recent, so some old unformatted sequential or direct access file software was limited to 32-bit addressing, so it can be easier to go to stream access and 64-bit addressing for larger files. It is very effective for creating large random access files with variable record lengths.

If you strip off the record formats, then there is not much difference between formatted and unformatted with stream access, as stream does not have records, but collections of data, which the programmer must manage.

Formatted stream access is non-advancing I/O that allows you to use format statements to read or compose the data formats on the file, rather than your requirement of defining the way you interpret text in the file beyond the capability of format statements.

To solve your problem, your fortran solution is:

  • read each line as text,
  • parse spaces/delimiters to determine the count of numbers in each line
  • read each number and store in a systematic way.

It is a shame list directed I/O did not provide the count functionality.

Thanks @JohnCampbell for the response. I cannot, however, agree with some of your statements.

That is what I thought when I read about stream files for the first time. Apparently, for formatted files, stream access does have records, contrary to what one could expect. My examples above show clearly, that the file is read line-by-line, not byte-by-byte.

Again, not quite. I’d expect that end-of-record markers were treated same as spaces, as C I/O does. But they are not.

Well, it is not non-advancing read. Were it so, the consecutive READ statements would read the next value from the same line, not the next one.

OK, but this is not a solution I would expect from a modern programming language having stream access. Not to mention that the only way to read each line as text, without knowing (in advance) the maximum length of line in the file being read is to read using non-advancing mode, one character at the time, possibly reallocating the variable to which the line is being read. Pretty inefficient.

I know, you are perfectly right. It is really a pity that this phrase does not read:

Your text file format had been a problem in Fortran for a long time, until stream access was introduced.

Formatted stream access should format the packet of information, as per the format statement, but not append the packet with “cr” or “lf”. What is the record definition you are identifying ?
Can you give an example ?

My understanding is stream access does not have “end of record markers”.

Can you provide a small code example that demonstrates this ?

I am wondering what you are complaining about ?
My understanding of Fortran stream access is either Fortran formatted or Fortran sequential unformatted, without the record delimiters, but with byte addressing. If it is not that then can you demonstrate the existence of “cr” “lf” in formatted stream or any {reclen} {data} {reclen} structure in unformatted stream.
An advantage of stream IO is it is easy to place conventional record seperators in the file, such as cr, lf or ht markers for your later use.
My experience of formatted stream is limited, but I have done extensive testing of unformatted stream, which provides a “canvas” for creating my own indexed record structure, especially where file size or record size exceeds 2 GBytes. Certainly Gfortran’s and Ifort’s treatment of large records is a mess, in my opinion, although it is a definate format.

I think I did give examples of what I am complaining about, in my original post.

Formatted stream read apparently treats the external file in a record-oriented way, contrary to what I would expect from stream access. Reading just one value from a line containing many of them, advances the file “pointer” to the beginning of the next file, so the next value read is that from the next line. That is obviously not a non-advancing mode. With some subtle differences (using POS specifier etc.) it seems to be the same as normal sequential formatted input.

That makes an easy (as it seems) task of reading irregularly formatted file (as in my example) a very complicated problem in Fortran. It is straightforward in C and that is exactly what I am complaining about - the (recently introduced) Fortran stream access could have given Fortran programmers an equivalent of C stream access but - for formatted files - it has failed to do so.

@msz59
Can you provide a link to the text file you created ? I’d be interested to see what “other structure” characters are in this C file. Does this file structure comply with what Fortran stream access will accept ?
I have been reading Modern Fortran Explained, which is very brief in it’s description of formatted stream input. Can you use List-directed I/O for formatted stream input ?
You could place an inquire (iu, pos=file_address) before and after each read to see how many characters are read.

Apart from you see in my first post, the file has an LF character at the end of each line (typical un*x format of text file)

I managed to get formatted stream (and here) I/O to work quite well with good performance in parsing and writing JSON files. Note that the read/write statements explicitly use '(A)' format specifiers and advance="no" to avoid whatever parsing/processing list-directed I/O would normally do (like advancing to the next record).

The problem of reading an arbitrary number of values from a line is addressed in several libraries easily available via fpm(1). It is common for many languages to depend on open-source libraries for many functions. The stated problem has been solved repeatedly, just not using formatted stream I/O. Several Fortran wiki examples and an example module that supports many string functions including reading an arbitrary number of numeric values from a line might be useful.

Read arbitrary number of values from a line

https://fortranwiki.org/fortran/show/getvals

Read line of arbitrary length

https://fortranwiki.org/fortran/show/readline

Use libraries via fpm(1):

So the problem is easily solved in Fortran, just not using an approach identical to C. Like python or C, a lot of external libraries are available for common problems are now available easily using fpm(1). Hopefully the fpm(1) site for registered packages will be in production soon, making for more reliable use of external libraries.

1 Like

It appears that opening as access=‘stream’, form=‘formatted’ is not what you have expected.

It can be much easier changing to open with access=‘stream’ ( form=‘unformatted’ is the default).
Now it can be much easier to interogate the file.

The following adaptation of your program provides a basic interpretation of +ve integer numbers and demonstrates the simplicity of stream I/O for those who in the past have struggled with reading text files with unusual number delimiters.

program t
implicit none
character(1) :: ch
integer :: iu, m, file_address

open ( newunit=iu, file='test.txt', access='stream' )   !, form='formatted' )

call report_address ( iu, file_address )
do
  
!  read (unit=iu, *, end=333, advance='NO') m
  call get_next_number_from_stream ( iu, m )
  print *, 'number recovered =',m
  call report_address ( iu, file_address )
end do

333 continue
end program t

subroutine report_address ( lu, file_address )
  implicit none
  integer :: lu, file_address
  inquire (unit=lu, pos=file_address)
  write (*,*) 'file at byte',file_address
end subroutine report_address

subroutine get_next_number_from_stream ( lu, number )
  implicit none
  integer :: lu, number
  integer :: nc, ic
  integer :: pos = 0
  integer :: czero = ichar('0')
  character :: ch*1, token*10

  nc = 0
  token = ' '
  number = 0
  do
    read (unit=lu, end=333) ch    !  fmt='(a)', 
    pos = pos+1

    ic = ichar (ch)
    if ( ic == 13 ) then
      write (*,*) pos,' <next>', ' <cr>', ic
    else if ( ic == 10 ) then
      write (*,*) pos,' <next>', ' <lf>', ic
    else if ( ic == 9 ) then
      write (*,*) pos,' <next>', ' <ht>', ic
    else if ( ic == 32 ) then
      write (*,*) pos,' <next>', ' <sp>', ic
    else
      write (*,*) pos,' <next>    ', ch, ic
    end if

    if ( index ( '0123456789+-', ch ) > 0 ) then
      nc = nc+1
      token(nc:nc) = ch
      number = number*10 + (ic-czero)     ! simple recovery of positive integer only

    else if ( nc > 0 ) then   ! any non digit character is treated as the end of the numeric token
      exit
    end if

  end do

! retrieve number with list directed I/O works now
!  read ( token, * ) number
  write (*,*) 'next number =',number
  return

333 write (*,*) 'end of file reached, token = '  ,token
    stop
end subroutine get_next_number_from_stream

Thanks to all for code snippets, library links. Maybe I mis-labelled my OP as Help. I could probably write such ir similar code myself but it was not my point. I was actually wondering why the stream access is so different for formatted and unformatted mode. I guess you would admit that having to write several dozen lines of code to obtain something that could require just one read if the f.s.a. were really stream-like - just as unformatted is.

It seems like the missing piece to the puzzle here is that there is no format field that reads an integer of unknown field width. You can write an integer with minimal field width with an i0 or a g0, but there is no analogous field specifier to read an integer and then leaves the record pointer positioned after the last digit. List-directed input apparently doesn’t work for this either. This is one of the things that programmers have been requesting for decades, and especially since nonadvancing i/o was added to the language.

Maybe something like this stdlib/example/strings/example_stream_of_strings_to_numbers.f90 at master · fortran-lang/stdlib · GitHub ?

That is used after the characters have been read into a character string. This discussion is about reading from the stream file directly, without first transferring the record into a character string.

I am apparently not alone in (mis?)understanding the way formatted stream access should work. Stephen J. Chapman, in his Fortran for Scientists and Engineers. Fourth Edition writes:

14.7
STREAM ACCESS MODE
The stream access mode reads or writes a file byte by byte, without processing spe-
cial characters such as carriage returns, line feeds, and so forth. This differs from
sequential access in that sequential access reads data a record at a time, using the
carriage return and/or line feed data to mark the end of the record to process. Stream
access mode is similar to the C language I/O functions getc and putc, which can
read or write data a byte at a time, and which treat control characters just like any
others in the file.
A file is opened in stream access mode by specifying ACCESS=‘STREAM’ in the
OPEN statement. A typical OPEN statement for a stream access is shown below.
OPEN ( UNIT=8, FILE='infile.dat', ACCESS='STREAM', FORM='FORMATTED', IOSTAT=istat )
Data can be written out to the file in a series of WRITE statements. When the pro-
grammer wishes to complete a line he or she should output a “newline” character
(similar to outputting \n in C). Fortran includes an intrinsic function new_line(a)
that returns a newline character of the same KIND as the input character a.

I tried to write a formatted stream file:

OPEN (UNIT=8, FILE='x.dat', ACCESS='STREAM', FORM='FORMATTED')
WRITE (8, '(A)') 'Text1'
WRITE (8, '(A)') 'Text2'
WRITE (8, '()')   ! empty record
WRITE (8, '(A)') 'Text3'
WRITE (8, '(A)') new_line(' ')
CLOSE (8, IOSTAT=istat)

The output file x.dat contains:

Text1
Text2
                     <-- empty record outputs its own LF
Text3

                     <-- Three LF at the end of the file!!!

with a newline characters after each write! Even outputting an empty record produces a newline which is obviously non standard-conforming, as the F2018 Draft document says:

12.3.3.4 Stream access
[…]
4 While connected for formatted stream access, an external file has the following properties.
[…]
• Writing an empty record with no record marker has no effect.

This works as shown both in gfortran and ifx. So, maybe, the formatted stream access is just wrongly implemented as a whole?

Hmmm. I am Interested in an interpretation from the Standards committee on this one. As implemented by the compilers I have used formatted streams on I think of it as the same as normal sequential formatted file I/O with the added ability to record file positions using INQUIRE that I can then return to. Other than that, I cannot think of anything I expect to be different between standard sequential formatted I/O and stream formatted I/O.

So with formatted stream I/O I can build an index to file positions as I write them or via an initial pass through the file and subsequently return to those positions I recorded via INQUIRE more efficiently than a series of BACKSPACE calls or rereading the file.

Other than that I expect it to be the same; creating record marks at the end of an advancing I/O WRITE and so on. But I see several on-line references that seem to imply no EOR or EOF markers are inserted by default, and that you need to call NEW_LINE(‘A’) to insert an end-of-record. This is not my understanding at all for FORMATTED stream I/O.

Can anyone on the committee interject on this one? I took the reference to the statement in the standard above to imply a non-advancing WRITE did nothing, but an advancing one would be expected to write an EOR.

Here is another example I found on-line that appears to assume formatted stream I/O does not write end-of-records unless explicitly told to (ie. like unformatted stream I/O):

 OPEN(UNIT=11, FILE="mystream", STATUS="REPLACE", ACCESS="STREAM", FORM="FORMATTED")
WRITE(11, "(4A)") "first line", NEW_LINE("x"), "second line", NEW_LINE("xcon")
INQUIRE(UNIT=11, POS=mypos)

So I consider the example from the book wrong unless ADVANCE=‘no’ is used on the WRITE statements.

And another difference I forgot to mention between formatted sequential and stream formatted sequential is the stream I/O should not be limited as to line lengths; whereas formatted sequential is; albeit nowadays that is typically a very large number (in the past it was often limited to 255 or even 128 or 132 or 80 by default).

Quoting some relevant sections of the standard:

A file is composed of either a sequence of file storage units (12.3.5) or a sequence of records, which provide an extra level of organization to the file. A file composed of records is called a record file. A file composed of file storage units is called a stream file. A processor may allow a file to be viewed both as a record file and as a stream file; in this case the relationship between the file storage units when viewed as a stream file and the records when viewed as a record file is processor dependent.

Stream access is a method of accessing the file storage units (12.3.5) of an external stream file.

The properties of an external file connected for stream access depend on whether the connection is for unformatted or formatted access. While connected for stream access, the file storage units of the file shall be read or written only by stream access data transfer statements.

While connected for formatted stream access, an external file has the following properties.

  • Some file storage units of the file can contain record markers;
  • Writing an empty record with no record marker has no effect.

C1223 (R1213) An ADVANCE= specifier shall appear only in a formatted sequential or stream data transfer statement with explicit format specification (13.2) whose io-control-spec-list does not contain an internal-file-variable as the io-unit.

The ADVANCE= specifier determines whether advancing input/output occurs for a nonchild data transfer statement. If YES is specified for a nonchild data transfer statement, advancing input/output occurs. … If this specifier is omitted from a nonchild data transfer statement that allows the specifier, the default value is YES.

The REC= specifier specifies the number of the record that is to be read or written. This specifier shall appear only in a data transfer statement that specifies a unit connected for direct access; … Any other data transfer statement is a sequential access data transfer statement or a stream access data transfer statement, depending on whether the file connection is for sequential access or stream access.

So unless I missed something, formatted I/O is still record based, even if the file is connected for stream access, unless the advance="no" specifier is used. So

write(lu, '()')

while an empty record, is not “with no record marker”.

1 Like

That was my understanding. So that would make the book example and the Fortran Wiki examples incorrect, and when I google “fortran formatted stream read” at least for me the first three hits show examples where end-of-records are not written; although I found others that conform to the idea that they do. So it looks like the easiest way to get the behavior of the C
fscanf procedure is to open the file in C and call the fscanf procedure :slight_smile:

If everyone agrees the Fortran WIki, which seems to be the first reference provided by several search engines as the first match for “Fortran stream I/O” (and is very useful for other aspects of stream I/O) should be corrected? It looks like materials like that lead the OP and others to expect formatted stream I/O to be a different beast than it is, as it is still record-oriented.

And it would be nice to have a list-directed I/O be compatible with advanced=‘no’ and return a position that could be returned in POS= and SIZE= that could read the next variable, or allow fmt=‘(g0)’ and so on on a READ, or an “fscanf” for Fortran; but currently the Fortran methods available for such an operation require either using C I/O calls from Fortran or parsing.

So it looks like, from the comments above, this really was a “Fortran enhancement” request, and not a “Help” request(?).

Otherwise, the original question is answered by saying the documentation you read is wrong,
and Fortran formatted stream I/O is pretty much just like non-stream I/O except record lengths are not limited and you can query the position and move to it, as I see it, anyway.

You can also overwrite characters within a stream file, while you cannot do that with a normal sequential file.

I have not used stream files before, so this discussion has been enlightening.

Well, you might be in for a bit of a surprise, or at least a byte of surprise. What do you think will happen if you run this; and then cat(1) the file after the program finishes.

So far, I got three results from three compilers:

program t
implicit none
integer,parameter :: isz=10
integer :: i, iu, ipos(isz)=1, isize
open(newunit=iu, file='overwrite.txt', access='stream', form='formatted')
do i=1,isz
  inquire(iu,pos=ipos(i))
  write(iu,*)'write=',i,'ipos=',ipos(i)
enddo

inquire(iu,size=isize)
flush(iu)
call execute_command_line('cat overwrite.txt')
write(*,*)'file size is',isize

write(iu,*,pos=ipos(8))'new eigth line'
inquire(iu,size=isize)
flush(iu)
call execute_command_line('cat overwrite.txt')
write(*,*)'file size is',isize

write(iu,'(a)',pos=ipos(3),advance='no')'new third line'
inquire(iu,size=isize)
flush(iu)
call execute_command_line('cat overwrite.txt')
write(*,*)'file size is',isize

end program t

They all seem to agree on truncating the file when advancing I/O occurs when overwriting line 8, but take their own paths when overwritten with non-advancing I/O.

But (I think) it is clear in the standard that unformatted stream I/O overwrites bytes; but that formatted stream I/O truncates the file at the end of an advancing I/O statement.

What the compilers do with non-advancing I/O appears to indicate there is a difference of opinion there, and/or a number of unfiled bug reports (or at least I could not find any related ones).

So it might be surprising, but FORMATTED I/O causes truncation when an advancing I/O write occurs, and I think that is correct.

What happens with non-advancing I/O with actual compilers is everything from truncation with an end-of-record added to the last line, byte overwriting, or byte overwriting while the file is open but truncation when the file is closed, … and then use non-advancing I/O but call NEW_LINE(‘a’) or explicitly put out a newline character(s) and it gets even less consistent.

So the question I have is what should happen when the non-advancing I/O statement is used to overwrite part of line 3? And then there is the question when the new line3 is longer than the original.