How to read CSV data with fields that contain spaces?

CRquantum · October 22, 2021, 12:13am

Dear all,

A question, I have a csv file called data.txt, I want to read it.
The data.txt is below

Vostok 1,1961,Yuri Gagarin
Vostok 2,1961,Gherman Titov
Vostok 3,1962,Andriyan Nikolayev
Vostok 4,1962,Pavel Popovich
Vostok 5,1963,Valery Bykovsky
Vostok 6,1963,Valentina Tereshkova

I read the website below
https://cyber.dabamos.de/programming/modernfortran/files.html
In the " Reading CSV Files" section it describes how to read csv file using type.

From it I wrote my small code as below,

program main
    implicit none
type :: mission_type
    character(len=8)  :: name
    integer           :: year
    character(len=20) :: pilot
end type mission_type
type(mission_type), allocatable :: missions(:)
integer            :: fu, rc, i, j
open (action='read', file='data.txt', iostat=rc, newunit=fu)
if (rc /= 0) stop
i = 0
do ! get the size of missions(:)
    read (fu, *, iostat=rc) 
    if (rc /= 0) exit  
    i = i + 1  
enddo
allocate(missions(i))
rewind(fu)
do j = 1,i ! read from data.txt
    read (fu, *, iostat=rc) missions(j) ! it uses * as the format.
    if (rc /= 0) exit
end do
close (fu)
print *, missions(1)%name
print *, missions(1)%year
print *, missions(1)%pilot
end program main

I expect the output should be

Vostok 1
1961
Yuri Gagarin

However, obviously, fortran recognize the space also as a delimitator, and therefore the result below is not what I wanted,

 Vostok
           1
 1961

Does anyone have suggestion as how to read this csv file correctly?

Thank you very much in advance!

PS.
There is a stack overflow answer for a similar issue, it suggests read the whole line first, then do something further. Is there better way?

Beliavsky · October 22, 2021, 12:22am

I think the fortran-csv-module handles this.

CRquantum · October 22, 2021, 12:38am

Thank you very much.
I know that module. I just wish to write a csv reader myself for some particular csv files. The test files of that module seems does not contain space. fortran-csv-module/files at master · jacobwilliams/fortran-csv-module · GitHub
The csv in the question is an illustration example, I wanted to see if there are good ways to handle the space in csv file.

Beliavsky · October 22, 2021, 12:47am

Read each line into a string using the “(a)” format.
Write a loop to determine the positions of commas in the string
Use the positions of the commas to determine substrings from which you can read variables. Note that if c is a string c(i1:i2) is the substring from position i1 to i2. Read character variables using the “(a)” format.

CRquantum · October 22, 2021, 12:53am

Thank you so much!
If I do

read (fu, '(A)') line

Do I have to hard code the length of line, such as say, 1024, like below?

character(len=1024) :: line

urbanjost · October 22, 2021, 2:04am

In this VERY specific case shown with no spaces in the INTEGER value you could switch the spaces with a null, do your list-directed read, and switch the nulls back to spaces. The other answers are much more robust, but that is a fun example.

But what generates your CSV file?

If you just placed quotes around your strings your original program would work as-is. List-directed input assumes strings are quoted, as do most utilities that work with CSV files.

A parsing approach not even using list-directed I/O would be able to be much more robust and flexible as discussed above; and of course you already explained why you did not want to use it, but in general other users looking for a production solution here instead of an educational exercise should try to support a standard open-source library such as the CSV module and even contribute to its code if they need additional features.

program main
implicit none

type :: mission_type
character(len=8)  :: name
integer           :: year
character(len=20) :: pilot
end type mission_type
character(len=:),allocatable :: shortline
type(mission_type), allocatable :: missions(:)
integer            :: fu, rc, i, j
   open (action='read', file='data.txt', iostat=rc, newunit=fu)
   if (rc /= 0) stop
   i = 0
   do ! get the size of missions(:)
      read (fu, *, iostat=rc) 
      if (rc /= 0) exit  
      i = i + 1  
   enddo
   allocate(missions(i))
   rewind(fu)
   do j = 1,i ! read from data.txt
      shortline=repeat(' ',80)
      read (fu,'(a)')shortline
      shortline=trim(adjustl(shortline))
      call flip(shortline,' ',char(0))
      read (shortline, *, iostat=rc) missions(j) ! it uses * as the format.
      call flip(missions(j)%name,char(0),' ')
      call flip(missions(j)%pilot,char(0),' ')
      if (rc /= 0) exit
   end do
   close (fu)
   print *, missions(1)%name
   print *, missions(1)%year
   print *, missions(1)%pilot
contains

subroutine flip(string,a,b)
character(len=*)            :: string
character(len=1),intent(in) :: a, b
integer :: i
   do i=1,len(string)
      if(string(i:i).eq.a)string(i:i)=b
   enddo
end subroutine flip

end program main

CRquantum · October 22, 2021, 4:52am

Thank you very much @urbanjost , you explanation is great.
That csv file is just for fun, it is the from the website I read,

In the " Reading CSV Files" section it describes how to read this csv file using type. But obviously the space cause prblem.

I have another small issue, if file contain ’ . ', read will have problem, would you mind, if you have time, quickly point out what is wrong? Thank you very much indeed!

CRquantum · October 22, 2021, 6:48am

Thank you very much for your explanation.
I have another small question. I notice that shortline is an allocatable array. By doing the nice trick shortline=repeat(' ',80)
in the do loop, it works fine.
But I just wonder, in the loop, the first loop j=1 we do

shortline=repeat(' ',80)

it is fine that I understand shortline is automatically allocated.
But the in the next loop j=2,
why this shortline=repeat(’ ',80) will not give an error, like error: shortline has already been allocated?
It seems shortline has not been deallocated, it is nicely repeatedly used, during the loop.

What are the difference between

shortline=repeat(' ',80)

and

allocate(shortline, mold=repeat('', 80))

I know the latter in the loop, will give me error saying shortline has been allocated already.

Thank you very much!

urbanjost · October 22, 2021, 4:17pm

I sent a note to @engel that the strings in the example are not quoted; as that is just an errata.

It would probably be better to read into a fixed-length variable instead of dynamically allocating,
that just saved a few lines but brings up the point that you can dynamically allocate allocatable strings on assignment, which is a relatively new feature versus taking direct control with allocate and deallocate, which as you noted requires you to deallocate manually before reallocating but lets you treat the variable once allocated almost like a fixed-length value.

To handle the ‘.’ you would be better off doing the parsing as described earlier, but you could read everything as a string and then read the string for the numeric value testing for strings like ‘.’ first, but that is just a kludge and at that point a true parsing of the file yourself is actually better. Using BLANK=‘ZERO’ on the OPEN would not do what you want in this case. There are several examples of SPLIT routines available including a working example of one that acts like the proposed SPLIT for the next Fortran release that would let you easily split on a comma and that would be an easier way to go, as doing what I did here would not even properly handle a simple variation like there being a blank in the integer field value.

urbanjost · October 22, 2021, 4:31pm

List-directed I/O is so close to a CSV file, and actually was the original inspiration for CSV; but CSV went a little on it’s own way and is not a true standard, with many variants anyway. It would be nice if CSV was truly standardized, and Fortran list-directed I/O would be a good model for the standard. It would benefit many users, as I often see issues with CSV like how it handles quotes and missing values and tabs and so on in all kinds of forums.

themos · October 22, 2021, 5:29pm

It is not.

CRquantum · October 22, 2021, 5:39pm

@themos Thank you!
Eh, yeah, this is something I am a little confused, in its definition,

character(len=:),allocatable :: shortline

It seems that shortline is allocatable array. But I think you are right, could you please give a little bit more explanation as to why shortline is not really allocatable array?

themos · October 22, 2021, 5:52pm

Because there is nothing in that line making it an array. It is a scalar quantity of an intrinsic type CHARACTER which has a length-type parameter, LEN, which is deferred.

CRquantum · October 22, 2021, 6:02pm

@themos Thank you very much!
May I ask, so, what does the

allocatable

do in the definition of shorline?

character(len=:),allocatable :: shortline

I am sorry this may be a stupid question

themos · October 22, 2021, 6:16pm

It gives shortline the ALLOCATABLE attribute.

art-rasa · October 22, 2021, 9:24pm

The way I understand is that allocatable makes it possible to change the length of the character string (shortline) during runtime of the program. Without allocatable, the length of the string would be fixed, set to stone, so to speak, during compile time.

everythingfunctional · October 23, 2021, 2:27am

scalars can be allocatable. You can do

integer, allocatable :: x
allocate(x)
x = 42

Of course the above is a little silly, but is perfectly valid. One common case of an allocatable scalar is for polymorphic variables. I.e.

class(base_type), allocatable :: x
allocate(child_type :: x)

With deferred length characters you can do

character(len=:), allocatable :: string
allocate(character(len=42) :: string)
string = "Hello, World!" ! This automatically reallocates string to the length of "Hello, World!"

Topic		Replies	Views
Reading file contain '' . '' issue	1	392	October 22, 2021
Iteratively Reading Integers with Leading Spaces Help	9	433	September 27, 2023
Reading data: format of variables Help	12	723	March 11, 2023
Checking for zero data entry Help	6	465	July 4, 2023
Reading File knowing certain variables Help	13	1128	November 13, 2022

How to read CSV data with fields that contain spaces?

Related topics