I have to take care of replacing binary files in one of our programs and following the advice in File Input/Output — Fortran Programming Language I first check whether it exists and if so use the status ‘replace’ instead of ‘unknown’. But is this really necessary? If it does not exist, what would be the consequences of stubbornly using ‘replace‘?
When you use the ‘REPLACE’ specifier, a new file is always created. The only difference is that, if the file already exists, it’s deleted before the creation step.
So you can safely use ‘REPLACE’ without having to check for existence first.
EDIT:
The (final draft of the) standard, at 12.5.6.19, states:
If REPLACE is specified and the file does not already exist, the file is created and the status is changed to OLD.
If REPLACE is specified and the file does exist, the file is deleted, a new file is created with the same name, and the status is changed to OLD.
Thanks for that clarification. This makes sense.
The ‘replace’ option was added after f77. With ‘unknown’, some f77 compilers would rewrite over the existing records of an old file, but keep the original file space allocation. At the end of the job, the old file space would be allocated, but only a small fraction of that space might actually be used. When disk space was limited, as it almost always was in the 1980s, this could cause resource allocation problems. The workaround was to open the old file, then close with ‘delete’, then open it again. That logic is what the newer ‘replace’ option does.
AFAIR, a simpler way to truncate a file overwritten with less data than the original, was to execute endfile unit
. That worked fine, at least on *ixes that I did use.
I much prefer “unknown” for opening direct access binary/backup files.
Selecting “replace” removes the option for partial update.
For sequential binary files as “unknown”, they should be truncated when a new sequential write takes place, so “replace” provides no extra gains.
Well, the reason I asked about this status is illustrated by the following program:
! chk_stream.f90 --
! Check the behaviour of stream access files when rewriting
!
program chk_stream
implicit none
integer :: i, size
real :: x(1000)
x = [(0.1 * i, i = 1,1000)]
open( 10, file = 'chk_stream.out', access = 'stream' )
write( 10 ) x
close( 10 )
inquire( file = 'chk_stream.out', size = size )
write(*,*) 'Written 1000 reals to file'
write(*,*) 'Size of the file: ', size
write(*,*) 'Reopening and writing 10 reals (zero)'
open( 10, file = 'chk_stream.out', access = 'stream' )
write( 10 ) (0.0, i = 1,10)
close( 10 )
inquire( file = 'chk_stream.out', size = size )
write(*,*) 'Size of the file: ', size
end program chk_stream
The output is (with both gfortran and ifx):
Written 1000 reals to file
Size of the file: 4000
Reopening and writing 10 reals (zero)
Size of the file: 4000
So, when you are dealing with binary stream-access files, it may be necessary to explicitly replace existing files. Other types of files are automatically replaced when you do this type of things.
I don’t think this was ever, or is currently, specified by the language standard. The i/o model for sequential files is for a tape device, and for a tape file there would have been an endfile mark placed on the tape, but the tapes original contents after that mark would not have been changed. Certainly, the tape would not have been erased beyond that point every time an endfile mark was written. For a disk file (or for SSD devices, a simulated disk file) I’m not sure that an endfile mark is ever placed within the file. Usually these days, the endfile behavior specified by the standard is simulated using the filesystem position and size parameters. I’m thinking particularly about what happens with fortran endfile
and backspace
statements on sequential files.
Regarding the unformatted stream example, I guess I’m not surprised that it happens that way, but I am a little surprised that the same thing doesn’t happen also for formatted stream files. Both formatted and unformatted stream files allow the pos=
to be specified, which allows overwriting of individual file positions without altering the contents of the file at other positions. I guess I don’t fully understand this aspect of stream access.
That can work with sequential files. But endfile
is not used for direct (random) access files. Same with backspace
and rewind
.
I think this is correct. The only way to make a direct access file shorter is to write the parts you want to keep to a new file, delete the old file, and then rename the new file. Or if you just want to throw away the old file and write a new one, then ‘replace’ would work.
However, as discussed above, with formatted stream access it is possible to use pos=
to write to a specific place in the old file, and then the rest of the file does seem to be truncated accordingly. I’m unsure exactly why this happens for formatted stream but not for unformatted stream.
It works but it requires endfile
statement. F2023 Final Draft states in 12.8.3:
Execution of an ENDFILE statement for a file connected for stream access causes the terminal point of the file
to become equal to the current file position. Only file storage units before the current position are considered to have been written; thus only those file storage units shall be subsequently read.
Accordingly, adding endfile(10)
to @Arjen’s code after rewriting the file with 10 reals results in:
$ ./a.out
Written 1000 reals to file
Size of the file: 4000
Reopening and writing 10 reals (zero)
Size of the file: 40
Interestingly, the same section of the Standard states:
Execution of an ENDFILE statement for a file connected for sequential access writes an endfile record as the next record of the file. The file is then positioned after the endfile record, which becomes the last record of the file. If the file can also be connected for direct access, only those records before the endfile record are considered to have been written
I cannot, however, think of any usable way to do it by opening the file for sequential access, be it formatted or unformatted. Formatted read requires line structure, unformatted read expects two binary record lengths for each record. So, the unformatted stream plus endfile seems to be the way to truncate files otherwise used for direct access.
I think that is true for most of the current filesystems. However, in the past I have used filesystems that supported fixed-length sequential records without embedded record structure. This was on an IBM mainframe, but I forget the details of the JCL to do this. In many applications in my field of quantum chemistry, direct access files are written sequentially (i.e. the record number increases by one on each write), but then read randomly, sometimes several times. On the IBM machines, we would do the first write part by opening the file ‘sequential’, then close it and reopen it with ‘direct’ for the read steps. That gave us a little performance boost because of the way the sequential files were buffered, a little bit like doing asynchronous i/o. I guess the modern version of this would be to use stream access to write the file, and then close and reopen the file for direct access. I don’t think that is required to work for all combinations of physical devices and file systems, but if it does work it seems to be blessed by the standard according to the 12.8.3 text that you found.
Intriguing, I had all but forgotten the ENDFILE statement.
As for IBM machines: I used to use an IBM minicompuiter where there were basically two types of files: fixed-length (F) and variable-spanned (VS, if I remember the meaning of the abbreviation correctly). VS was used to create unformatted files.
Here is a little test program that shows that this does work.
program stream
implicit none
integer :: i, j, len, n=10, nrec=4, a(10)
open(n,file='stream.dat',form='unformatted',access='stream')
do i = 1, nrec ! write sequentially.
a(:) = i
write(n) a
enddo
close(n)
inquire(iolength=len) a
open(n,file='stream.dat',form='unformatted',access='direct',recl=len)
a(:) = -1
do i = 1, nrec ! read randomly.
j = modulo(nrec-i-1,nrec) + 1
read(n,rec=j) a
write(*,'("rec=",i0," a= ",*(i0,1x))') j, a
enddo
end program stream
$ flang stream.f90 && a.out
rec=3 a= 3 3 3 3 3 3 3 3 3 3
rec=2 a= 2 2 2 2 2 2 2 2 2 2
rec=1 a= 1 1 1 1 1 1 1 1 1 1
rec=4 a= 4 4 4 4 4 4 4 4 4 4
This is not required to work for all devices, but it does demonstrate the open/close/open functionality for devices that the compiler chooses to support.
Actually, one should probably always specify thestatus
option in an open()
statement explicitly. The 2023 standard (12.5.6.19) says for the status=
specifier:
If UNKNOWN is specified, the status is processor dependent. If this specifier is omitted, the default value is
UNKNOWN.
So a robust program should never rely on the default behavior IMHO.
And here is the modification of @RonShepard’s test program with added truncation of the just written file after second record.
program stream
implicit none
integer :: i, j, len, n=10, nrec=4, ntrunc=2, a(10)
open(n,file='stream.dat',form='unformatted',access='stream')
do i = 1, nrec ! write sequentially.
a(:) = i
write(n) a
enddo
close(n)
inquire(iolength=len) a
open(n,file='stream.dat',form='unformatted',access='direct',recl=len)
a(:) = -1
do i = 1, nrec ! read randomly.
j = modulo(nrec-i-1,nrec) + 1
read(n,rec=j) a
write(*,'("rec=",i0," a= ",*(i0,1x))') j, a
enddo
close(n)
open(n,file='stream.dat',form='unformatted',access='stream')
write(n,pos=ntrunc*len+1) ! I hope this is standard-conforming way to position a file
endfile(n) ! truncation
close(n)
end program stream
That is a good question. I’m still getting up to speed with stream access, so I would like to see some comments on this too. I think, but I’m not certain, that the only allowable POS=
values are either 1, indicating the beginning of the file, or values that have been returned by an INQUIRE(...POS=...)
. If that is the case, then to be conforming one would need to do one of the following:
- read to the appropriate place within the file, then do the ENDFILE, or
- execute the INQUIRE to get the POS= value at some point earlier in the code, and then use that value to set the new position (I think either a read or a write works for this), and then do the ENDFILE.
Obviously, that first option is not very efficient for a large file. Anyone know if there is another standard way to just set the POS within a stream file to some location?
Only when the stream is formatted!
12.6.2.12 POS= specifier in a data transfer statement
[…]
If the file is connected for formatted stream access, the file position specified by POS= shall be equal to either 1 (the beginning of the file) or a value previously returned by a POS= specifier in an INQUIRE statement for the file.
I have also find the (possible) answer to my doubt about using write
with pos=
specifier and an empty I/O list:
12.3.4.4 File position after data transfer
[…]
For unformatted stream input/output, if no error condition occurred, the file position is not changed. For unformatted stream output, if the file position exceeds the previous terminal point of the file, the terminal point is set to the file position.
NOTE 1 An unformatted stream output statement with a POS= specifier and an empty output list can have the effect of extending the terminal point of a file without actually writing any data.
I guess we may presume that in a case of POS= specifier pointing to a place within the current size of the file, with an empty I/O list, just positions the file at that offset from its start.
In the case of unformatted stream, which is the connection we are discussing now in this thread, I wonder if this is really left ambiguous by the standard, or if there is more text somewhere that defines how this works. In the case of formatted stream, the standard assumes that not all POS values are allowed by the underlying file system and i/o library, and it thereby requires that POS values used in read/write statements are those that are returned by INQUIRE. I would assume that this is to allow for padding within the file system – e.g. maybe only odd POS values are allowed, or only values of the form 1+n*64
, or something like that. But why not have that same kind of restriction for unformatted stream? Some arbitrary POS value is likely to end up somewhere in the middle of a real64 value, and you would not want that to happen. The expression you used, pos=ntrunc*len+1
, certainly looks reasonable for this simple case, but it would be reassuring to see some supporting text somewhere in the standard.
I have verified that this truncation works as expected on MacOS with the gfortran, flang, and nagfor compilers. The file size reported by INQUIRE(...size=...)
agrees with the result afterwards of ls -l
. So this approach does appear to allow truncation of a direct access file, something that was not possible without stream access.
The POS specifier is allowed only for stream access.
12.6.2.12 POS= specifier in a data transfer statement
The POS= specifier specifies the file position in file storage units. This specifier shall not appear in a data transfer statement unless the statement specifies a unit connected for stream access.