Understanding Fortran file I/O

Something I haven’t properly learned yet is all the ways in which Fortran can access files. I gather that there are three access modes - sequential, direct, and stream; and that there are three file structures - formatted, unformatted and binary (with some combination of the aforementioned being illegal). Basically my understanding is as follows, is this roughly correct?

  • Direct access files have a fixed record length so that you can jump straight to a given record.

  • Sequential access files don’t have a fixed record length, which is convenient, but removes the possibility of random access.

  • Stream access is basically doing I/O as raw bytes (most similar to C-style IO)

  • Formatted files are in ASCII. Formatted stream files are rarely used, one normally uses sequential formatted files. Records are basically lines of text (newline and/or carriage return terminated).

  • Unformatted files that aren’t stream-access are in a compiler specific format that includes internal record separators. It’s binary, so quicker to read and write than formatted files, but aren’t human-readable. Stream unformatted files are in practice raw byte views just like C’s fread/fwrite.

I don’t really know anything about “binary” file structure.

Is it still common/recommended to use record based IO at all? Or is stream access preferred in most cases? It seems to me that in particular, formatted sequential access provides a useful way of getting data into and out of human readable / interoperable files, but I’m curious what the modern best practices are.

I apologize if this post is a bit of a mess - I’m trying to wrap my head around the different concepts and only vaguely understand them…

4 Likes

Data you find on the web is often in CSV format, and it is convenient to use the record-based IO to read it. You can easily write CSV data too. When data sets get big enough that formatted reads are too slow, I may define a derived type and create a subroutine that writes the derived type as an unformatted stream and another subroutine that reads such data. For example, to deal with time series with dates in files such as

,SPY,HYG
2007-04-11,110.289658,44.11438
2007-04-12,110.779778,44.143967
2007-04-13,111.285194,44.063625

I have code such as

type :: date_frame_full
   character (len=1000)                 :: title = ""
   type(date_mdy)         , allocatable :: dates(:)  ! (nobs)
   character (len=len_sym), allocatable :: sym(:)    ! (nvar)
   real(kind=dp)          , allocatable :: xx(:,:)   ! (nobs,nvar)
end type date_frame_full

subroutine write_date_frame_stream(df,out_file)
! write a date_frame to out_file using unformatted stream I/O
type(date_frame_full) , intent(in) :: df
character (len=*), intent(in) :: out_file
integer                       :: outu
call get_unit_open_file_stream(out_file,outu,xaction="w")
write (outu) num_obs(df),num_var(df)
write (outu) df%title,df%dates,df%sym,df%xx
close (outu)
end subroutine write_date_frame_stream
!
subroutine read_date_frame_stream(df,infile)
! read a date_frame from infile using unformatted stream I/O
type(date_frame_full) , intent(out) :: df
character (len=*), intent(in)  :: infile
integer                        :: iu,nobs,nvar
call get_unit_open_file_stream(infile,iu,xaction="r")
read (iu) nobs,nvar
call alloc_df(df,nobs,nvar)
read (iu) df%title,df%dates,df%sym,df%xx
close (iu)
end subroutine read_date_frame_stream

There is also a subroutine to read CSV files into a type(date_frame_full).

3 Likes

ACCESS options are sequential, direct, and stream.

FORM options are formatted and unformatted. There is no “binary” in the standard, though some vendors have it as an extension (from before stream existed). The combination of stream and unformatted is basically like binary.

4 Likes

@hsnyder,

Your two quoted statements reflect history up until recent times, but as support toward Fortran 2003 standard has improved in compiler implementations and the practitioners have become brave enough to employ the features therein,

  • the use of formatted stream files is increasing and note they are rather similar to formatted sequential files and they prove rather valuable, especially with large files;
  • and one is starting to see some use of UTF-8 encoding with character set of ISO 10646 toward formatted input/output that is allowed in Fortran. Better compiler support will be welcome here. Note the limitation of ASCII in Fortran surrounds internal, not external files. I assume your interest here is in external file IO.
1 Like

Interesting, what’s the difference between formatted sequential and formatted stream, then? Lack of line termination as a record separator?

Thanks to everyone who has replied so far, this is helping to fill in the missing pieces…

The “classic” formatted sequential transfer is record-oriented. Each transfer will advance the file pointer by one or more records (in this case, lines), unless you use ADVANCE='NO' clause in OPEN. Stream is treated like a sequence of bytes. It is a fairly new concept and AFAIU, in formatted I/O, is more or less an equivalent of sequential non-advancing, probably except for the concept of the end-of-record (EOR) status, available only in sequential. Also, the newline characters are not automatically written at the end of WRITE statement.

Can you explain where formatted stream files are useful? My understanding is that the Formatted form only allows the repositioning using a read/write with POS= in rather specific circumstances, not to any byte position whatever, as you can do with an unformatted stream. I was once told by someone more expert than me that formatted stream was only included in Fortran because C has both formatted and unformatted forms (text and binary in C terms) so Fortran ought to have them too.

The chief advantage seems to be that no record length (as in direct access) or maximum record length (as in sequential, where you get the compiler’s default if you don’t specify) is involved.

Working with very, very large files that are not “unformatted” per Fortran language-speak.

Re: the use of POS= file IO, when used in conjunction with INQUIRE statement, I’ve found it to work well for all the programmer needs communicated to me by the teams I work with.

What are the circumstances where you thought POS= does not work with formatted stream files?

In the Fortran 2018 Standard see section 12.6.2.11 (or Fortran 2008 section 9.5.2.11) paragraph 3 which says
"If the file is connected for formatted stream access, the file position specified by POS= shall be equal to either (the beginning of the file) or a value previously returned by a POS= specifier in an INQUIRE statement for the file. "
This seems to me a rather serious restriction on using “random” access to such a file. There is no such restriction on an unformatted stream file. The reason is, as far as I understand it, that a Formatted stream file contains record terminators of unknown size, these might be CR or LF or CR and LF, or even something else. Because of this reading records does not correspond to a predictable number of bytes, and vice-versa. Thus Fortran prohibits a program that uses this because of the indeterminacy. There is a rather similar prohibition on using the BACKSPACE statement when a record has been written using list-directed or namelist output, but this is because the number of lines (=records) is then indeterminate.

I agree that stream files are useful, my quibble was over the Formatted flavour. To me they seem about as useful as Formatted Direct-access files, a another flavour (sorry flavor if you are on the other side of the pond) which I have never found the need to use.

Sorry to slightly bump the thread, but I figured a new thread wasn’t necessary. I’ll open a new thread if anyone thinks it would be better.

Can one do parallel I/O using Fortran’s I/O features? It seems to be that, given say a Direct-access unformatted file, different processes could write to/read from different records at the same time with no risk of overlap. If we’re talking modern Fortran constructs, we might have coarrays and we want to quickly write/read a coarray in parallel. For my case, I’m thinking of algorithms that have to read/write large amounts of data during a computation to disk to prevent bottlenecking RAM, and perhaps doing so asynchronously.

Is this possible, and if so how would I go about doing it? (Bonus points for being a portable data file, i.e. different compilers/computers could read it). Would/could I use the same file unit? Other details? Etc.

Look into the ASYNCHRONOUS I/O specifier (and variable attribute) in the Fortran standard and support for it in the processors you use.

I see. The asynchronous I/O is well-explained in this link, IBM Async I/O. It appears for the different compilers/systems I need to link certain libraries such as pthreads or openmp, which is fine I suppose.

However, I did not see anything about combining this with a parallel execution model such as MPI or coarrays. Presumably async I/O will work fine with openmp, but I do not know if there are any pitfalls to watch out for with having multiple threads open the same data file (even if only accessing different records).