Reading lines from a file without padding

I have long wondered how to read lines from a file in Fortran. The obvious

character (len=1000) :: text
! connect file to unit iu
read (iu,"(a)") text

is often good enough but truncates lines that are longer than 1000 characters, and you don’t know
if trailing blanks in text are actually in the file. Using a code by Jacob Williams to read a file into an allocatable string and then searching the string for newline characters, it is possible to recover the lines of a file. Here is the program read_file.f90:

module read_mod
implicit none      
contains
function pos_char(str,char) result(ipos)
character (len=*), intent(in)  :: str
character (len=*), intent(in)  :: char ! should have len = 1
integer          , allocatable :: ipos(:)
integer                        :: i,nlen,nfound
nlen = len(str)
nfound = 0
allocate (ipos(nlen),source=0)
ipos = 0
do i=1,nlen
   if (str(i:i) == char) then
      nfound = nfound + 1
      ipos(nfound) = i
   end if
end do
ipos = pack(ipos,ipos>0)
end function pos_char
!
subroutine read_file(filename, str)
! slightly modified from code by Jacob Williams at
! https://community.intel.com/t5/Intel-Fortran-Compiler/Text-file-to-allocatable-string/td-p/1033082
character(len=*)             , intent(in)  :: filename
character(len=:), allocatable, intent(out) :: str
! local variables:
integer                                    :: iunit,istat,filesize
character (len=1)                          :: c
open (newunit=iunit,file=filename,status="old",action="read", &
      form="unformatted",access="stream",iostat=istat)
if (istat==0) then
   ! how many characters are in the file:
   inquire(file=filename, size=filesize)
   if (filesize>0) then
      ! read the file all at once:
      allocate (character(len=filesize) :: str)
      read(iunit,pos=1,iostat=istat) str
      if (istat==0) then
         ! make sure it was all read by trying to read more:
         read (iunit,pos=filesize+1,iostat=istat) c
         if (.not. IS_IOSTAT_END(istat)) &
            print*,"Error: file was not completely read."
      else
         print*,"Error reading file."
      end if
      close (iunit, iostat=istat)
   else
      print*,"Error getting file size."
   end if
else
   print*,"Error opening file."
end if
end subroutine read_file
end module read_mod
!
program read_file_into_string
use read_mod, only: pos_char, read_file
character (len=:), allocatable :: text
integer                        :: i1,i2,ipos
integer          , allocatable :: ipos_new(:)
integer          , parameter   :: max_lines_print = 3
call read_file("read_file.f90",text)
print*,"len(text) =",len(text)
ipos_new = pos_char(text,new_line(" "))
print*,"#new_line =",size(ipos_new)
print "(/,'first few lines with length:')"
i1 = 1
do i=1,min(max_lines_print,size(ipos_new))
   i2 = ipos_new(i)
   print "(i4,1x,a)", i2-i1+1,"'" // text(i1:i2-1) // "'"
   i1 = i2 + 1
end do
end program read_file_into_string

output:

 len(text) =        2478
 #new_line =          74

first few lines with length:
  17 'module read_mod'
  21 'implicit none      '
  10 'contains'

(The line with trailing blanks is present for demonstration.) The output is consistent with
wc read_file.f90, which gives

74 261 2478 read_file.f90

1 Like

I have a read_file_lines function that makes use of the get subroutine from iso_varying_string. They may be of interest for looking at additional solutions.

1 Like

A while back I was trying out stream IO in Fortran and made this. Finally got around to publish it now that I saw your question :slight_smile: . Not sure I’ve ironed out all edge cases so proceed with some caution…

Example:

program example
    implicit none

    block
        ! Write a file that we can read back afterwards
        integer :: unit
        open(newunit=unit, file='file.txt')
        write(unit, '(a)') 'Hello world'
        write(unit, '(a)') 'This line is longer than the previous line'
        write(unit, '(a)') 'This line is shorter than the previous'
        write(unit, '(a)') 'Now we are done'
        close(unit)
    end block

    block
        ! Use this library to read the file
        use line_reader_mod, only: &
            line_reader_t, &
            error_t

        type(line_reader_t) :: reader
        type(error_t), allocatable :: error
        character(len=:), allocatable :: line

        fallible: block
            reader = line_reader_t()
            call reader%open('file.txt', error)
            if (allocated(error)) exit fallible
            do while (reader%has_next())
                call reader%next(line, error)
                if (allocated(error)) exit fallible
                write(*, '(a)') 'Got line: "' // line // '"'
            end do
        end block fallible
        if (allocated(error)) then
            write(*,'(a)') 'Unexpected error!'
            write(*,'(a)') error%display()
            error stop
        end if
    end block
end program

Outputs:

Got line: "Hello world"
Got line: "This line is longer than the previous line"
Got line: "This line is shorter than the previous"
Got line: "Now we are done"
Got line: ""

Note that the last empty line is actually a part of the file written in the first block.

Will have to look closer into the code from @everythingfunctional though. I thought that stream IO was needed for achieving this.

1 Like

For further examples, see the Fortran Wiki or the module M_io, which among
other things includes

   git clone https://github.com/urbanjost/M_io.git
  • getline - read a line from specified LUN into allocatable string up to line length limit
  • read_line - read a line from specified LUN into allocatable string up to line length limit cleaning up input line
  • gulp - read a file into a character array line by line
  • slurp - read a file into a character array
  • read_table - read file containing a table of numeric values

There are other related routines and example programs, as listed at

https://urbanjost.github.io/general-purpose-fortran/docs/man3.html

as well.

1 Like

Not necessarily. With ACCESS=SEQUENTIAL, one can use the REWIND statement on the file unit and perform READ twice (or multiple) times and achieve the same result. Thus a different rigmarole than that needed with ACCESS=STREAM. It will then be YMMV in terms of end user perception or experience with efficiency and/or elegance with the two approaches.