You’re right, I found it in Paragraph 17.16 “Some inquire statement enhancements”, it’s on page 326 (of the green book)… too bad
I gave it a try, thanks for such a great piece of software! When being compiled, however, it produces tons of warnings about “Fortran 2018 deleted features”, mostly “arithmetc IFs” (which are great to see again after probably some 30 years ). Is it possible to get rid of those particular warnings?
That is probably some old bessel functions. There are bessel functions in the intrinsics now so I think I can delete them if the intrinsics do all the same types but they work. The vast majority should all be without warnings accept some float comparisons. You can also turn off specific messages by their message number. I will take a look and see if there are others. Probably some in the linear algebra routines and the matlab88 look-alike
but should not be too many.
See this Intel forum post from 2015 for the history of this routine: Text file to allocatable string - Intel Community
It looks like there used to be something in there to check that the file was actually read by trying to read more? Not sure if that was necessary? I don’t remember.
The procedure is open source, so I used spag (not associated with it in any way) on M_bessel and it looks to have eliminated all the computed gotos.
I do not have the latest version, which can eliminate all the gotos,
so I want to try that and make better unit tests for the functions (long overdue) so it might be a while before I actually update GPF but that was surprisingly easy considering how much old spaghetti was in those procedures. There are a few in the M_calcomp module as well, partly as a homage to the old code that originated from. Used to get regular requests for the Calcomp interface but that has faded away so it might be time to retire M_calcomp and M_bessel anyway but the conversion process was so easy I think I will complete it and leave them in for now. It makes an interesting example to keep around to show the old “before” and the more readable “after” files.
A “Summer of Code” project might be for someone to go through the most-used public domain Netlib libraries with spag and fpt ?!
@jacobwilliams just wanted to say thanks for your AoC repository above, well organized, and great little codes that we can use as tests for a compiler. I think it’s a nice showcase for Fortran.
Great thread, thanks for the link! Here is the (functional) chunked version I’ve been suggesting for stdlib:
!> Version: experimental
!>
!> Reads a whole ASCII file and loads its contents into a string variable.
!> The function handles error states and optionally deletes the file after reading.
type(string_type) function getfile(fileName,err,delete) result(file)
!> Input file name
character(*), intent(in) :: fileName
!> [optional] State return flag. On error, if not requested, the code will stop.
type(state_type), optional, intent(out) :: err
!> [optional] Delete file after reading? Default: do not delete
logical, optional, intent(in) :: delete
! Local variables
type(state_type) :: err0
integer, parameter :: buffer_len = 65536
character(len=:), allocatable :: buffer,fileString
character(len=512) :: iomsg
integer :: lun,iostat
integer(int64) :: mypos,oldpos,size_read
logical :: is_present,want_deleted
! Initializations
file = ""
allocate(character(len=buffer_len) :: buffer)
!> Check if the file should be deleted after reading
if (present(delete)) then
want_deleted = delete
else
want_deleted = .false.
end if
!> Check file existing
inquire(file=fileName, exist=is_present)
if (.not.is_present) then
err0 = state_type('getfile',STDLIB_IO_ERROR,'File not present:',fileName)
call err0%handle(err)
return
end if
open(newunit=lun,file=fileName, &
form='unformatted',action='read',access='stream',status='old', &
iostat=iostat,iomsg=iomsg)
if (iostat/=0) then
err0 = state_type('getfile',STDLIB_IO_ERROR,'Cannot open',fileName,'for read:',iomsg)
call err0%handle(err)
return
end if
allocate(character(len=0)::fileString)
read_by_chunks: do
! Read another buffer
inquire(unit=lun,pos=oldpos)
read (lun, iostat=iostat, iomsg=iomsg) buffer
if (is_iostat_end(iostat) .or. is_iostat_eor(iostat)) then
! Partial buffer read
inquire(unit=lun,pos=mypos)
size_read = mypos-oldpos
fileString = fileString // buffer(:size_read)
iostat = 0
iomsg = ''
exit read_by_chunks
else if (iostat == 0) then
! Full buffer read
fileString = fileString // buffer
else
! Read error
err0 = state_type('getfile',STDLIB_IO_ERROR,'Error reading',fileName,'at character',oldpos)
exit read_by_chunks
end if
end do read_by_chunks
if (want_deleted) then
close(lun,iostat=iostat,status='delete')
if (iostat/=0) err0 = state_type('getfile',STDLIB_IO_ERROR,'Cannot delete',fileName,'after reading')
else
close(lun,iostat=iostat)
if (iostat/=0) err0 = state_type('getfile',STDLIB_IO_ERROR,'Cannot close',fileName,'after reading')
endif
! Process output
call move(from=fileString,to=file)
call err0%handle(err)
end function getfile
Now that I know where the file size can be taken from, a single-instruction version will definitely be faster!
Well, it might not be, it depends how the inquire is implemented. Here is a function that we use in C++ to read a file into a string: lfortran/src/libasr/utils2.cpp at 5e9643afbfb375f892ca369196f69e6cc29e87bf · lfortran/lfortran · GitHub. In there we open the file and set the file pointer at the end of the file using std::ios::ate
. Then we read the file pointer, that gives us the size. Then we move it to the beginning ifs.seekg(0, std::ios::beg);
and actually read it. However, I don’t know if this is inefficient.
The question is how to get the file size. I guess in C++17 there is now fs::file_size(filename)
which I am guessing underneath uses some system calls (such as stat
on Linux and FindFirstFileA
on Windows), and I am guessing the inquire
function in Fortran can be implemented in the same way. Is it faster to use these system calls than setting the file pointer position like we do above?
Ultimately we would need to benchmark it on some large file to see.
I agree: it does indeed always looks like it returns the whole size in bytes. This is a summary of what my MRC book says about it:
- size […] is assigned the size of the file in file storage units.
- if […] it cannot be determined,
-1
is returned- if the file has
stream
access,size=
returns the number of the highest-numbered file storage unit in the file (?)- if the file has
sequential
ordirect
access, the file size may be different by the number of units implied by the data in the records, and the exact relationship is processor-dependent.
so what I understand is that size basically always returns a number of array elements:
- with a simple default
stream
open
, we have 1-bytecharacter
s, so size will return the size in bytes; - if you open it with
UTF-8
encoding, using 4-byte character kind, it will be the number of 4-byte characters, not the number of bytes (in other words, it always returns the array size) - if you have records, they will have header/metadata, so the actual file size will be larger than what
size
returns. But, I’ve never used this option here so I’m not sure. I believe both gfortran and ifx use4-byte
record flags at the beginning and the end of each record.
I’ve put a simple test program here: the returned size
seems always in bytes for both gfortran and ifx.
PS: I think a cross-platform file_size
wrapper is definitely a necessary filesystem operation for stdlib.
The actual text is: the file size may be different from the number of units which IMHO means only that the file size in bytes (storage units) may be different from the total size of data in the records of the file, because of the record sizes stored in unformatted sequential files and, maybe also because of line endings in formattes ones. But I’d guess the size returned by inquire
would always be the “OS” file size.
this seems to be the default case in gfortran, but it was not so in the earlier versions. The description of the -frecord-marker
options says:
-frecord-marker=length
Specify the length of record markers for unformatted files. Valid
values for length are 4 and 8. Default is 4. This is different
from previous versions of gfortran, which specified a default
record marker length of 8 on most systems. If you want to read or
write files compatible with earlier versions of gfortran, use
-frecord-marker=8.
If one used the default, 4-byte record length markers, longer records are seemingly divided into subrecords. E.g. writing real :: tab(1000000000)=0.0
array as a single record to a sequential unformatted file gives (with frecord-marker=4):
$ hexdump -C sequnf4.dat
00000000 09 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
7ffffff0 00 00 00 00 00 00 00 00 00 00 00 f7 ff ff 7f 09 |................|
80000000 28 6b 6e 00 00 00 00 00 00 00 00 00 00 00 00 00 |(kn.............|
80000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
ee6b2800 00 00 00 00 00 00 00 00 00 00 00 00 f7 d7 94 91 |................|
ee6b2810
while when compiled with -frecord-marker=8
:
$ hexdump -C sequnf8.dat
00000000 00 28 6b ee 00 00 00 00 00 00 00 00 00 00 00 00 |.(k.............|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
ee6b2800 00 00 00 00 00 00 00 00 00 28 6b ee 00 00 00 00 |.........(k.....|
ee6b2810
In both versions the total size of the file is 4,000,000,016 bytes.
This would be great news for Fortran, as it would be a truly portable way to inquire the file size in bytes. If any of the Standard gurus would confirm that, then we could avoid going through C-interoperable wrappers to the Windows or POSIX APIs.
(PS to @certik @Beliavsky and admins: please split if you wish as we’re derailing onto a separate topic)
It works on unopened files also (through inquire(file=fname,size=isize)
). I am not sure what the standard means by saying (section 12.10.2.30 of the F2018 Final Draft)
2 For a file that can be connected for stream access, the file size is the number of the highest-numbered file storage unit in the file.
3 For a file that can be connected for sequential or direct access, the file size may be different from the number of storage units implied by the data in the records; the exact relationship is processor dependent.
I think that any plain, readable file can be connected for stream access. So the size reported should be just the number of storage units (bytes) for any such file.
I think the (3) paragraph is simply allowing for embedded metadata within the file, such as <lf>
or <cr>
or <cr><lf>
markers for formatted files or for the record length data for unformatted sequential files. That metadata would be included in the file size or in the storage unit counts in each record, but it might not count as part of the user data that was written to the file originally.
I think that files written to record oriented hardware cannot be connected with stream access. Consider, for example, a file written to a tape drive. It is not physically possible to overwrite a byte within such a file while leaving the rest of the file on the tape intact, so the characteristics of stream access would not be allowed on such a file.
Regarding the size, perhaps there is intended to be a difference allowed for formatted vs. unformatted stream? Such a difference would be processor dependent.
You are probably right, though
- Tape drives are sometimes referred to, apparently confusing, as tape streamers
- File size is obviously unknown in such case, at least not before the whole file is read, unless
- There are file systems available on tapes, e.g. Linear Tape File System (LTFS) (LTO/Ultrium tape drives ever since LTO-5 (the actual version is LTO-9). We use those tapes extensively for data backup but never tried the LTFS itself.
Today’s puzzle (day 13) was nice because I finally was able to use some linear algebra, which is nice in Fortran. But I had some trouble with rounding and precision.
Spoiler for Day 14, Part 2:
# #
# #
# #
#
#
# # #
#
# #
#
# # #
#
# #
#
#
#
#
# # # #
#
#
#
# # ###############################
# #
# #
# #
# # # #
# # # #
# # ### #
# # ##### #
# ####### # # #
# # # ######### # #
# # ##### #
# ####### # #
# ######### # # #
# ########### # #
# # ############# # #
# # # ######### #
# # # # ########### #
# ############# # #
## # # ############### # #
# # # # # ################# # #
# ############# #
# ############### #
# ################# # #
# # ################### #
## # # ##################### #
# ### #
# # # ### #
# ### #
# #
# #
# #
# # #
# ############################### #
#
#
# # #
#
#
#
#
#
# ##
#
#
# # #
# #
The arithmetic IF usage is gone, and added a flat text version of the procedure descriptions for easy CLI searching for those without the man-pages installed as well. See
doc/man-pages*.txt
The one file is a table of contents the other is the man-pages
for about 1 300, of the procedures; although not all all done; plus the Fortran intrinsics and some odds and ends and a few program descriptions.
For the AoC enthusiasts, these analytics look very interesting Advent of Code analysis through the years – Blog Jesse van Elteren