Reading a matrix of unknown size

Is there a public domain subroutine that reads a real matrix of arbitrary size from a text file? I can write one but would probably be reinventing the wheel. R and Python with Pandas have more general functions to read data frames.

My program which calls a subroutine to read a matrix with a fixed number of columns, optionally with column labels, is

module kind_mod
implicit none
private
public :: dp
integer, parameter :: dp = kind(1.0d0)
end module kind_mod
!
module read_matrix_alloc_mod
use kind_mod, only: dp
implicit none
private
public :: read_mat_real
contains
subroutine read_mat_real(xfile,ncol,xx,col_labels,max_rows)
! read from xfile matrix xx(:,:) and optionally column labels
character (len=*), intent(in)               :: xfile
integer          , intent(in)               :: ncol
real(kind=dp)    , intent(out), allocatable :: xx(:,:)
character (len=*), intent(out), optional    :: col_labels(:)
integer          , intent(in) , optional    :: max_rows
real(kind=dp)                 , allocatable :: xcp(:,:)
integer                                     :: i,ierr,ierr_read,iu,max_rows_,nrows
character (len=10000)                       :: text
character (len=*), parameter                :: msg="in read_matrix_alloc_mod::read_mat_real, "
if (present(col_labels)) then
   if (size(col_labels) /= ncol) then
      write (*,*) msg,"ncol, size(col_labels) =",ncol,size(col_labels)," must be equal, STOPPING"
      stop
   end if
end if
if (present(max_rows)) then
   max_rows_ = max_rows
else
   max_rows_ = 100000
end if
open (newunit=iu,file=xfile,action="read")
if (present(col_labels)) then
   read (iu,"(a)",iostat=ierr_read) text
   read (text,*,iostat=ierr) col_labels
   if (ierr /= 0) then
      write (*,*) msg,"could not read ",size(col_labels)," words from '" // &
                  trim(text) // "', STOPPING"
      stop
   end if
end if
allocate (xx(max_rows_,ncol))
nrows = max_rows_
i = 0
do
   read (iu,"(a)",iostat=ierr_read) text
   if (ierr_read /= 0) then
      nrows = i
      exit
   end if
   i = i + 1
   if (i > max_rows_) exit
   read (text,*,iostat=ierr_read) xx(i,:)
   if (ierr_read /= 0) then
      backspace(iu)
      nrows = i-1
      exit
   end if
end do
allocate (xcp(nrows,ncol))
xcp = xx(:nrows,:)
deallocate (xx)
allocate (xx(nrows,ncol))
xx = xcp
deallocate (xcp)
end subroutine read_mat_real
end module read_matrix_alloc_mod
!
program xread_mat_real
use kind_mod             , only: dp
use read_matrix_alloc_mod, only: read_mat_real
integer                         :: i
integer           , parameter   :: ncol = 2
real(kind=dp)     , allocatable :: xx(:,:)
character (len=20), allocatable :: col_labels(:)
allocate (col_labels(ncol))
call read_mat_real("xsquares.csv",ncol,xx,col_labels)
write (*,"(100a10)") (trim(col_labels(i)),i=1,size(col_labels))
do i=1,size(xx,1)
   write (*,"(100f10.4)") xx(i,:)
end do
end program xread_mat_real

which for file xsquares.csv

x,x^2
2.0,4.0
2.5,6.25
1.5,2.25

gives output

 x       x^2
2.0000    4.0000
2.5000    6.2500
1.5000    2.2500

There is loadtxt subroutine in @certik’s package GitHub - certik/fortran-utils: Various utilities for Fortran programs but it seems very simple. No support for comments/header. Arbitrary length is realized by reading the file twice, determining the number of rows on the first read. With really big files it can be much slower than some fancy ‘reallocation’ on-the-fly using MOVE_ALLOC.

We would certainly benefit from a true equivalent of the Python numpy.loadtxt routine, including features like comments (started with configurable set of chars, #%!, whatever), choosing columns to be read etc.

I have tried an approach that is similar to the “read_buoy()” and “num_records()” procedures in the Modern-Fortran repository.. Like msz59 mentioned, it first scans through the input data to find its length, then allocates just enough memory to contain it. It works well, but I have only used it for reading very small text files.

@msz59 could you elaborate what you mean by “fancy reallocation on the fly”? This concept is not familiar to me.

In the fortran-lang/stdlib project we already have a function called loadtxt. I don’t think it supports commas as separators though. The source code can be found here. If not the whole function you can reuse at least the functions to determine the number of rows and columns:


      ! determine number of columns
      ncol = number_of_columns(s)

      ! determine number or rows
      nrow = number_of_rows_numeric(s)

      allocate(d(nrow, ncol))
      do i = 1, nrow
        read(s, *) d(i, :)
      end do
      close(s)

Edit: I would like to add the code is MIT-licensed so you are free to reuse as you see fit.

2 Likes

If you are look especially for the csv format, there is GitHub - jacobwilliams/fortran-csv-module: Read and Write CSV Files Using Modern Fortran. Generally, I’d recommend to head to the IO category at the package index which offers plenty of libraries for different types of formats:

https://fortran-lang.org/packages/io

1 Like

I would certainly support this proposal as an extension to the current loadtxt in the stdlib repository. The workflow for proposing a new feature is documented here.

I mean allocating an array for a given, moderate size, then filling it. Once it is full and one still needs to put data into it, reallocate it bigger by doing something like

real, allocatable :: array(:), tmparray(:)
integer, parameter :: chunk=10000
integer :: cursize, needed_size
allocate(array(chunk))
...
cursize = size(array)
if ( needed_size > cursize ) then
  allocate(tmparray(cursize+chunk))
  tmparray(1:cursize) = array
  call move_alloc(tmparray, array)
      ! array gets deallocated, then acquires the allocation of tmparray,
      ! then tmparray gets deallocated
      ! in true code, you wouldd add error checking to allocate/move_alloc
endif
...
2 Likes

Why wouldn’t it? The code uses list-directed format to read data:
read(s, *) d(i, :)
That surely supports commas as well as any sequence of white space to delimit tokens. Only that adjacent commas have special meaning - skip the corresponding object on I/O list

Great! I didn’t realize it works for both comma or space. In gfortran it even worked with ; separators, but with ifort semicolons lead to an error.

My test program:

implicit none

real :: array(5)
integer :: unit

open(newunit=unit,file="test_read.txt",status="old")

read(unit,*) array(:)

print *, array

close(unit)

end program

A comma-separated file:

1.0, 2.0, 3.0, 4.0, 5.0,

Output:

~/fortran$ gfortran -Wall test_read.f90
~/fortran$ ./a.out
   1.00000000       2.00000000       3.00000000       4.00000000       5.00000000  

Try a file with
1.0, 2.0, 3.0/

Is it the end-of-file condition you are trying to point out, or reading into an array which is too long?

The program above was mainly just to test that it works with commas or white spaces as separators, but otherwise I am aware of these dangers.

No, it is not an EOF (you can check that using END= or IOSTAT= in READ). It is fully legal way of terminating a list-directed input (by putting a slash). All remaining objects on the I/O list are left untouched. If you initialize the array before, you will see it. Also, any text after the slash will be ignored, so one can use that for commenting the input data.

I do find it inconvenient that the output of

program main
implicit none
character (len=20)  :: text
character (len=10)  :: str
real                :: x
text = "2/13/2021 3.2"
x = -999.0
read (text,*) str,x
write (*,"(a,1x,f0.4)") "'" // trim(str) // "'",x
end program main

is '2' -999.0000 . Sometimes I use a kludge such as replacing / with a character that is unlikely to appear in the file and then replacing that character with /, as in

program main
implicit none
character (len=20)  :: text
character (len=10)  :: str
real                :: x
integer             :: i
character (len=1), parameter :: old_char="/",new_char="@"
text = "2/13/2021 3.2"
x = -999.0
do i=1,len_trim(text)
   if (text(i:i) == old_char) text(i:i) = new_char   
end do
read (text,*) str,x
do i=1,len_trim(str)
   if (str(i:i) == new_char) str(i:i) = old_char
end do
write (*,"(a,1x,f0.4)") "'" // trim(str) // "'",x
end program main

which gives the desired output '2/13/2021' 3.2000. (Admittedly, the American convention of writing dates as mm/dd/yyyy is odd and conflicts with the rest of the world using dd/mm/yyyy, so it is better to write dates as yyyy-mm-dd. But many files you download will have mm/dd/yyyy dates.)

It would be nice if the programmer could tell the compiler that for a certain read, / should be treated as an ordinary character and not an end-of-record marker. Also useful would be the ability to specify characters to be used as a delimiter in a read, since for a date string “2/13/2021” what you typically want to do is read three integers. I use code such as that above with new_char = "," to transform a string.

Well, the general rule for list-directed input is that the data should have a form of a literal constant. That means quoted character data. Reading text data without apostrophe has always been a complimentary feature and it works only if the text does not contain space, comma or slash. The modern Fortran versions include options to embed texts in single or double quotes when doing list-directed output.

Well, I checked the code. Contrary to what I guessed before, it does not support comma-separated data. Before it actually starts reading, it calls number_of_columns function which works based on blank-to-nonblank changes, so if there are commas only (no spaces), it will always find the file to contain just one column.

Then number_of_rows_numeric function gets called which reads the whole file just to count the rows. If the file contains comments, it will fail. If any of the lines contains less data then expected, loadtxt will fail (badly, reading the missing data from the next line, so the whole array will be corrupted).

I can volunteer to write something more flexible and robust, if needed.