Reading a matrix of unknown size

Beliavsky · February 13, 2021, 2:34am

Is there a public domain subroutine that reads a real matrix of arbitrary size from a text file? I can write one but would probably be reinventing the wheel. R and Python with Pandas have more general functions to read data frames.

My program which calls a subroutine to read a matrix with a fixed number of columns, optionally with column labels, is

module kind_mod
implicit none
private
public :: dp
integer, parameter :: dp = kind(1.0d0)
end module kind_mod
!
module read_matrix_alloc_mod
use kind_mod, only: dp
implicit none
private
public :: read_mat_real
contains
subroutine read_mat_real(xfile,ncol,xx,col_labels,max_rows)
! read from xfile matrix xx(:,:) and optionally column labels
character (len=*), intent(in)               :: xfile
integer          , intent(in)               :: ncol
real(kind=dp)    , intent(out), allocatable :: xx(:,:)
character (len=*), intent(out), optional    :: col_labels(:)
integer          , intent(in) , optional    :: max_rows
real(kind=dp)                 , allocatable :: xcp(:,:)
integer                                     :: i,ierr,ierr_read,iu,max_rows_,nrows
character (len=10000)                       :: text
character (len=*), parameter                :: msg="in read_matrix_alloc_mod::read_mat_real, "
if (present(col_labels)) then
   if (size(col_labels) /= ncol) then
      write (*,*) msg,"ncol, size(col_labels) =",ncol,size(col_labels)," must be equal, STOPPING"
      stop
   end if
end if
if (present(max_rows)) then
   max_rows_ = max_rows
else
   max_rows_ = 100000
end if
open (newunit=iu,file=xfile,action="read")
if (present(col_labels)) then
   read (iu,"(a)",iostat=ierr_read) text
   read (text,*,iostat=ierr) col_labels
   if (ierr /= 0) then
      write (*,*) msg,"could not read ",size(col_labels)," words from '" // &
                  trim(text) // "', STOPPING"
      stop
   end if
end if
allocate (xx(max_rows_,ncol))
nrows = max_rows_
i = 0
do
   read (iu,"(a)",iostat=ierr_read) text
   if (ierr_read /= 0) then
      nrows = i
      exit
   end if
   i = i + 1
   if (i > max_rows_) exit
   read (text,*,iostat=ierr_read) xx(i,:)
   if (ierr_read /= 0) then
      backspace(iu)
      nrows = i-1
      exit
   end if
end do
allocate (xcp(nrows,ncol))
xcp = xx(:nrows,:)
deallocate (xx)
allocate (xx(nrows,ncol))
xx = xcp
deallocate (xcp)
end subroutine read_mat_real
end module read_matrix_alloc_mod
!
program xread_mat_real
use kind_mod             , only: dp
use read_matrix_alloc_mod, only: read_mat_real
integer                         :: i
integer           , parameter   :: ncol = 2
real(kind=dp)     , allocatable :: xx(:,:)
character (len=20), allocatable :: col_labels(:)
allocate (col_labels(ncol))
call read_mat_real("xsquares.csv",ncol,xx,col_labels)
write (*,"(100a10)") (trim(col_labels(i)),i=1,size(col_labels))
do i=1,size(xx,1)
   write (*,"(100f10.4)") xx(i,:)
end do
end program xread_mat_real

which for file xsquares.csv

x,x^2
2.0,4.0
2.5,6.25
1.5,2.25

gives output

 x       x^2
2.0000    4.0000
2.5000    6.2500
1.5000    2.2500

msz59 · February 13, 2021, 10:14am

There is loadtxt subroutine in @certik’s package GitHub - certik/fortran-utils: Various utilities for Fortran programs but it seems very simple. No support for comments/header. Arbitrary length is realized by reading the file twice, determining the number of rows on the first read. With really big files it can be much slower than some fancy ‘reallocation’ on-the-fly using MOVE_ALLOC.

We would certainly benefit from a true equivalent of the Python numpy.loadtxt routine, including features like comments (started with configurable set of chars, #%!, whatever), choosing columns to be read etc.

art-rasa · February 13, 2021, 10:38am

I have tried an approach that is similar to the “read_buoy()” and “num_records()” procedures in the Modern-Fortran repository.. Like msz59 mentioned, it first scans through the input data to find its length, then allocates just enough memory to contain it. It works well, but I have only used it for reading very small text files.

@msz59 could you elaborate what you mean by “fancy reallocation on the fly”? This concept is not familiar to me.

ivanpribec · February 13, 2021, 11:03am

In the fortran-lang/stdlib project we already have a function called loadtxt. I don’t think it supports commas as separators though. The source code can be found here. If not the whole function you can reuse at least the functions to determine the number of rows and columns:


      ! determine number of columns
      ncol = number_of_columns(s)

      ! determine number or rows
      nrow = number_of_rows_numeric(s)

      allocate(d(nrow, ncol))
      do i = 1, nrow
        read(s, *) d(i, :)
      end do
      close(s)

Edit: I would like to add the code is MIT-licensed so you are free to reuse as you see fit.

awvwgk · February 13, 2021, 11:14am

If you are look especially for the csv format, there is GitHub - jacobwilliams/fortran-csv-module: Read and Write CSV Files Using Modern Fortran. Generally, I’d recommend to head to the IO category at the package index which offers plenty of libraries for different types of formats:

https://fortran-lang.org/packages/io

ivanpribec · February 13, 2021, 11:16am

I would certainly support this proposal as an extension to the current loadtxt in the stdlib repository. The workflow for proposing a new feature is documented here.

msz59 · February 13, 2021, 11:28am

I mean allocating an array for a given, moderate size, then filling it. Once it is full and one still needs to put data into it, reallocate it bigger by doing something like

real, allocatable :: array(:), tmparray(:)
integer, parameter :: chunk=10000
integer :: cursize, needed_size
allocate(array(chunk))
...
cursize = size(array)
if ( needed_size > cursize ) then
  allocate(tmparray(cursize+chunk))
  tmparray(1:cursize) = array
  call move_alloc(tmparray, array)
      ! array gets deallocated, then acquires the allocation of tmparray,
      ! then tmparray gets deallocated
      ! in true code, you wouldd add error checking to allocate/move_alloc
endif
...

msz59 · February 13, 2021, 11:37am

Why wouldn’t it? The code uses list-directed format to read data:
read(s, *) d(i, :)
That surely supports commas as well as any sequence of white space to delimit tokens. Only that adjacent commas have special meaning - skip the corresponding object on I/O list

ivanpribec · February 13, 2021, 11:51am

Great! I didn’t realize it works for both comma or space. In gfortran it even worked with ; separators, but with ifort semicolons lead to an error.

My test program:

implicit none

real :: array(5)
integer :: unit

open(newunit=unit,file="test_read.txt",status="old")

read(unit,*) array(:)

print *, array

close(unit)

end program

A comma-separated file:

1.0, 2.0, 3.0, 4.0, 5.0,

Output:

~/fortran$ gfortran -Wall test_read.f90
~/fortran$ ./a.out
   1.00000000       2.00000000       3.00000000       4.00000000       5.00000000

msz59 · February 13, 2021, 12:03pm

Try a file with
1.0, 2.0, 3.0/

ivanpribec · February 13, 2021, 12:15pm

Is it the end-of-file condition you are trying to point out, or reading into an array which is too long?

The program above was mainly just to test that it works with commas or white spaces as separators, but otherwise I am aware of these dangers.

msz59 · February 13, 2021, 12:40pm

No, it is not an EOF (you can check that using END= or IOSTAT= in READ). It is fully legal way of terminating a list-directed input (by putting a slash). All remaining objects on the I/O list are left untouched. If you initialize the array before, you will see it. Also, any text after the slash will be ignored, so one can use that for commenting the input data.

Beliavsky · February 13, 2021, 1:29pm

I do find it inconvenient that the output of

program main
implicit none
character (len=20)  :: text
character (len=10)  :: str
real                :: x
text = "2/13/2021 3.2"
x = -999.0
read (text,*) str,x
write (*,"(a,1x,f0.4)") "'" // trim(str) // "'",x
end program main

is '2' -999.0000 . Sometimes I use a kludge such as replacing / with a character that is unlikely to appear in the file and then replacing that character with /, as in

program main
implicit none
character (len=20)  :: text
character (len=10)  :: str
real                :: x
integer             :: i
character (len=1), parameter :: old_char="/",new_char="@"
text = "2/13/2021 3.2"
x = -999.0
do i=1,len_trim(text)
   if (text(i:i) == old_char) text(i:i) = new_char   
end do
read (text,*) str,x
do i=1,len_trim(str)
   if (str(i:i) == new_char) str(i:i) = old_char
end do
write (*,"(a,1x,f0.4)") "'" // trim(str) // "'",x
end program main

which gives the desired output '2/13/2021' 3.2000. (Admittedly, the American convention of writing dates as mm/dd/yyyy is odd and conflicts with the rest of the world using dd/mm/yyyy, so it is better to write dates as yyyy-mm-dd. But many files you download will have mm/dd/yyyy dates.)

It would be nice if the programmer could tell the compiler that for a certain read, / should be treated as an ordinary character and not an end-of-record marker. Also useful would be the ability to specify characters to be used as a delimiter in a read, since for a date string “2/13/2021” what you typically want to do is read three integers. I use code such as that above with new_char = "," to transform a string.

msz59 · February 13, 2021, 2:22pm

Well, the general rule for list-directed input is that the data should have a form of a literal constant. That means quoted character data. Reading text data without apostrophe has always been a complimentary feature and it works only if the text does not contain space, comma or slash. The modern Fortran versions include options to embed texts in single or double quotes when doing list-directed output.

msz59 · February 13, 2021, 5:12pm

Well, I checked the code. Contrary to what I guessed before, it does not support comma-separated data. Before it actually starts reading, it calls number_of_columns function which works based on blank-to-nonblank changes, so if there are commas only (no spaces), it will always find the file to contain just one column.

Then number_of_rows_numeric function gets called which reads the whole file just to count the rows. If the file contains comments, it will fail. If any of the lines contains less data then expected, loadtxt will fail (badly, reading the missing data from the next line, so the whole array will be corrupted).

I can volunteer to write something more flexible and robust, if needed.

Topic		Replies	Views
Arrays and lists Help	46	2389	October 5, 2023
Allocation of array for reading strings from a text file Help	20	1949	July 19, 2024
How can I set an array to the contents of a file? Help	8	1398	December 12, 2021
How to read CSV data with fields that contain spaces? Help	16	3802	October 23, 2021
How to read allocatables from unformatted file? Help	4	222	March 14, 2024

Reading a matrix of unknown size

Related topics