Best way to read only last two real values in a line using read in Fortran

snano · July 8, 2022, 10:02am

I want to read a line containing mixed data (real and characters) and I want to extract only the last 2 real values from this line. I write the following code to do this – is there a better way to do this? Python does use the “split method” to do it elegantly, is there a similar in Fortran?

implicit none
real :: x, y
character(len=10) :: char1
open (1, file='line.dat', status='old')
! line.dat contents
! ( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )

read (1, *) char1, char1, char1, char1, char1, char1, char1, char1, char1,&
          &  char1, x, y, char1
print *, x, y !   4.567000       1.232000  
end

Arjen · July 8, 2022, 11:07am

Well, there are several ways to do it and I guess that you do not know in advance how many items there are on the line. Here is a sketch:

Read a complete line from the file into a character string
Read an item (a character string is best) from that string. If there is no error, the try reading two items and then three and so on. Then you know how many items there are (N say).
Last step: read N-2 dummy items and the remaining two.

An alternative is to use a module that splits the line - I have one in http://flibs.sf.net (note: right now I cannot reach SourceForge, so can’t be more specific).

The reason for the above somewhat awkward procedure is that you do not always know how many items were read successfully. That seems to depend on the compiler.

Arjen · July 8, 2022, 11:11am

Ah, found it: flibs - a collection of Fortran modules / SVN / [r429] /trunk/src/strings. There are other similar packages out there.

snano · July 8, 2022, 12:31pm

thanks for the suggestion as well as the link to flibs.

Beliavsky · July 8, 2022, 12:35pm

You can use an implied do loop to read a variable repeatedly and effectively skip over fields, if you know how many to skip, as shown below.

program read_last
implicit none
character (len=:), allocatable :: text
character (len=1) :: char1
integer :: i
real :: x,y
text = "( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232"
read (text,*) (char1,i=1,10),x,y
print*,char1
print*,x,y
end program read_last
! output:
!  u
!    4.56699991       1.23199999

septc · July 8, 2022, 1:11pm

Yet another approach may be to find the necessary keyword (e.g. “unit2”) and read the real numbers after that keyword (I use this approach for reading the side length of simulation box from a coordinate file).

program main
    implicit none
    character(:), allocatable :: text
    integer :: i
    real :: x, y
    text = "( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232"

    i = index( text, "unit2" )
    read( text(i+5:), * ) x, y
    print *, x, y
end

urbanjost · July 8, 2022, 1:24pm

As suggested I would use a library, but assuming a space delimiter is
used around the numbers and that the text is very arbitrary preceding
it so that you really cannot use the many features of Fortran formats
such as skipping N characters, etc. this should work if you know some
max line_length …

program readfile
implicit none
integer,parameter          :: line_length=256
character(len=line_length) :: line
character(len=line_length) :: word
real                       :: a(line_length/2+1), aval
integer                    :: io,icount
real,allocatable           :: total(:)
integer,parameter          :: lun=5 ! lun=100
logical                    :: verbose=.false.
allocate(total(0))
!open(lun, file='data')
FILEREAD: do
   read(lun,'(a)',IOSTAT=io)line       ! read a line into character variable
   if(io.ne.0) exit FILEREAD
   if(verbose)write(*,*)'new line=',trim(line)
   icount=0                       ! initialize count of values found on line
   a=huge(0.0)
   do
      line=adjustl(line)                  ! remove leading spaces
      read(line,*,IOSTAT=io)word          ! read next token from line
      if (io.ne.0) exit
      if(verbose)write(*,*)'token=',word
      read(word,*,IOSTAT=io) aval
      if (io.eq.0)  then
         a(icount+1)=aval  ! convert token to a number
         icount=icount+1
      endif
      line=line(len_trim(word)+1:)   ! remove token just read
   enddo
   if(verbose)write(*,'(1x,a,g0,a,*(g0:,","))')'   read ',icount,' values=', a(:icount)
   if(icount.ge.2)then
           if(verbose)write(*,'(1x,a,*(g0:,","))')'   last two values=', a(icount-1:icount)
           total=[total,a(icount-1:icount)]
   else
           if(verbose)write(*,'(1x,a)')'   WARNING: line did not contain at least two numeric values='
   endif  
enddo FILEREAD
write(*,'(a,g0)')'total values read=',size(total)
write(*,'(*(g0:,","))')total
end program readfile

So that allows for that “)” after your numbers and so on; but if the numbers always follow the unit2 string or you know the number of preceding tokens, etc. those could be simpler. But if it really is arbitrary and the values are delimited by space or comma something like this is generic enough to handle that. Otherwise, you will have to split on other characters or replace all the characters that cannot be in a numeric value with a space or comma first (assuming no word after the numbers contains numeric characters).

JohnCampbell · July 10, 2022, 9:01am

I am a bit puzzled by the solutions that have been proposed, mainly because I don’t see a definition of what “fields” can be in the line and how the fields can be interpreted.

I presume a field delimiter could be one or multiple blanks ( or possibly “=” or a comma or a numeric operator + - * / or ^ ?)

A real value must be numeric ( and contain a “.” ? ) ( “1e3” or “1d3” can be ambiguous )

Many years ago, I wrote a parser for “free format” input, ( based on blanks or comma delimiters ) then being lazy, I tried to test each field as numeric by reading it for a real value as in

character field*20 
...
read (field, fmt='(f20.0)', iostat=iostat) val " 
if ( iostat == 0 ) then
!  have a numeric value

This worked well on the compilers I was using at the time, until I tried an IBM Fortran 77 compiler where iostat=iostat would not report a numeric conversion error, but just crash !
The standard apparently states that the response for iostat= is “processor dependent” !
That experience has scared me for life, as I was not expecting such a weak interpretation of the standard.
There are helpful/good compilers and there are unhelpful/poor compilers.

RonShepard · July 10, 2022, 2:49pm

I would say first that the problem has not been clearly specified. For example, is the field right before the two real numbers always ‘unit2’, or is it supposed to be some arbitrary field? If it is always ‘unit2’, then the simplest solution has already been posted by septc.

On the other extreme, is you don’t know how many items there are or their values, then the simplest approach is probably to read the entire line (in which case you must know the maximum line length ahead of time), and then scan that line backwards to extract the last two items. Actually, in your example, it looks like you need the last THREE items, two of which are the real numbers you want plus the trailing ‘)’. To do that scan, you must know what delimits the items. Is it just spaces, or can it include combinations of spaces and commas, following standard fortran list-directed i/o conventions? Or can it also include tab characters or other application-specific delimiters, such as ‘=()/’?

Then, it gets even more complicated if you are trying to mimic the full list-directed i/o conventions of repeated commas and repetition counts.

All of these can be done with the intrinsic string processing operations within fortran. But you need to actually specify the problem fully to know which approach to take. Compared to other languages and utilities, for example like perl or awk, the only important missing feature is regular expression parsing. That could actually be used in this example to advantage if it were available, but it would not simplify all the possible cases (e.g. handling repeated commas and repetition counts is still difficult).

FortranFan · July 11, 2022, 12:47am

@snano,

I think what you are referring to is similar to a discussion a while ago at comp.lang.fortran, see this link:
https://groups.google.com/g/comp.lang.fortran/c/UFGS4c7UGKg/m/moGz41wLAQAJ

What you will notice is there are currently several “string” library solutions out there that you can evaluate. StringiFor is one you can immediately consider:

Most or all of them include procedures similar to the “split method” in Python. If you don’t like the available solutions out there, you can roll your own also.

Another option is to wait X years where X can be anywhere from 1 to >15-20 years or forever depending on the compiler of your choice where you make use of the TOKENIZE intrinsic introduced with Fortran 202X. This intrinsic can enable what you are looking for.

I don’t have any copyright to a library solution I had helped a team develop to be able to post the working code here but I was able to temporarily gain access to it and try it out enough to illustrate what I think is the need here. The code was along the following lines:

   use string_m, only : string_t

   type(string_t) :: dat
   character(len=1), allocatable :: delimiters(:)
   type(string_t), allocatable :: tokens(:)
   real :: x, y

   ! assume the line from the 'line.dat' file has been read into a CHARACTER variable
   ! simulated here as a literal constant
   dat = "( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )"
   delimiters = [ "(", " ", "=", ")" ]
   call dat%split( delimiters, tokens )
   ! error handling elided

   associate( val => tokens(size(tokens)-1)%s() )
      read( val, "(f5.3)" ) x
      print *, x
   end associate

   associate( val => tokens(size(tokens))%s() )
      read( val, "(f5.3)" ) y
      print *, y
   end associate

end

The program output was

 4.567000
 1.232000
Press any key to continue . . .

You will notice the above library solution uses an array of delimiters to split a string into an array of tokens. It then takes the last two and performs a Fortran internal IO on the string representation to read them.into floating-point objects.

I think this is what you are looking for, Fortran nearly has it. Too bad and a shame it is not intrinsic yet in standard Fortran and there is no intrinsic (built-in) string type to make such string manipulation a breeze for the users.

snano · July 11, 2022, 2:55am

I would say first that the problem has not been clearly specified. For example, is the field right before the two real numbers always ‘unit2’, or is it supposed to be some arbitrary field? If it is always ‘unit2’, then the simplest solution has already been posted by septc.

unit1 and unit2 are units of the quantities. But for discussion here, I suppose them to be as such i.e. “unit2”. Also, I have “)” at the end of the line. So, suggested solution may not work in that case (but not a general solution if unit2 is variable)

To do that scan, you must know what delimits the items. Is it just spaces, or can it include combinations of spaces and commas, following standard fortran list-directed i/o conventions? Or can it also include tab characters or other application-specific delimiters, such as ‘=()/’?

In the problem, clearly, the delimiters are spaces (not tabs).

snano · July 11, 2022, 3:18am

thanks, @FortranFan. Yes, it looks cool.

In-addition, I also checked here as well (see the script below): https://groups.google.com/g/comp.lang.fortran/c/UFGS4c7UGKg/m/moGz41wLAQAJ. Seems flibs and StringIFor can do that, but I had some issues installing them on my machine (May be I will do it again.)

Also, I tried this approach to convert the line.dat to arrays of strings (see the below code), then I could not extract the last two reals:

module mod_string
    implicit none
    
    contains
 
    subroutine split2array(string, delimiter) 
       character(100), intent(in) :: string
       character, intent(in) :: delimiter ! ';'
    
       character(len=80), allocatable :: strarray(:)
       integer :: n, m 
       integer :: i, idx
       character(len=80):: strtmp
       strtmp = trim(adjustl(string))
       n = count([(strtmp(i:i), i=1, len_trim(strtmp))] == delimiter)
       allocate (strarray(n + 1))
    
       m = 1
       do i = 1, n
          idx = index(strtmp(m:), delimiter)
          strarray(i) = adjustl(strtmp(m:m + idx - 2))
          m = m + idx
       end do
       strarray(n + 1) = adjustl(strtmp(m:))
       print *, 'nvalues or string element are =', n + 1
       print '(*(a))', strarray(1:n+1) 
    end subroutine split2array
 end module mod_string
 
 
 program test
    use mod_string, only : split2array
    implicit none
    character(len=100) :: string
    character :: delimiter
 
    string = "( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )"
    delimiter = " "
    call split2array(string, delimiter) 
 end program test
 !output:
! nvalues or string element are =          15
!(                                                                               v=                                                                                                                                                              1)                                                                              w                                                                               =                                                                               2.843                                                                           unit1                                                                           =                                                                               5.867                                                                           unit2                                                                                                                                                           4.567                                                                           1.232                                                                           )

Now the task is: out of this array of strings, take the last 2 reals? Nonetheless, the solution suggested by @FortranFan is ok.

septc · July 11, 2022, 4:42am

I imagined that there are already a lot of split routines available from other libraries, but if a minimal / simple (stand-alone) routine is desired, it may be convenient to prepare it in a local library, for example…

module strmod
    implicit none
contains

subroutine split( line, words, nw )
    character(*), intent(in)  :: line
    character(*), intent(out) :: words(:)
    integer,      intent(out) :: nw
    character(len(words)) :: buf( size(words) )
    integer :: k, ios

    nw = 0 ; words(:) = ""

    do k = 1, size(words)
        buf( 1 : k ) = ""
        read( line, *, iostat=ios ) buf( 1 : k )
        if ( ios /= 0 ) exit
        nw = k
        words( 1 : nw ) = buf( 1 : nw )
    enddo
end

subroutine get_last_reals( words, vals )
    character(*), intent(in)  :: words(:)
    real,         intent(out) :: vals(:)
    integer :: k, ios, ir
    real :: x

    ir = size(vals)
    vals(:) = -99999   !! dummy value for "not found" (NaN may be better)

    do k = size(words), 1, -1
        read( words( k ), *, iostat=ios ) x
        if ( ios /= 0 ) cycle
        vals( ir ) = x
        ir = ir - 1
        if ( ir == 0 ) exit
    enddo
end

end module

program main
    use strmod
    implicit none
    character(80) :: line, words( 50 )
    integer :: nw, k
    real :: vals( 2 )

    line = "( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )"
    !! line = "a abc, 12.34 -23"
    !! line = "a; 4 b; 3, c,; abc,12.34, g, -23;"

    call split( line, words, nw )

    print *, "line = ", trim(line)
    print *, "number of words = ", nw
    print *, "words in line:"
    do k = 1, nw
        print "('len = ',i0,'  val = ',a)", len_trim( words(k) ), trim( words(k) )
    enddo

    call get_last_reals( words, vals )

    print *, "last two reals in line:", vals
end

$ gfortran-10 test.f90 && ./a.out
 line = ( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )
 number of words =           13
 words in line:
len = 1  val = (
len = 2  val = v=
len = 2  val = 1)
len = 1  val = w
len = 1  val = =
len = 5  val = 2.843
len = 5  val = unit1
len = 1  val = =
len = 5  val = 5.867
len = 5  val = unit2
len = 5  val = 4.567
len = 5  val = 1.232
len = 1  val = )
 last two reals in line:   4.56699991       1.23199999

(This Stackoverflow Q/A may also be related)

fortran90 - Parse random "String + Integers" in Fortran - Stack Overflow

But, if the line has something like “123.45)” (no space between reals and parentheses), this kind of method may just fail. In that case, I guess it may be necessary to first remove parenthesis from the line (e.g. by using a replace() like routine based on index()). If the line has a trailing comment that begins with # or ! etc, that part may also be needed to be dropped from the line first, I guess… (again, e.g. via index()).

JohnCampbell · July 11, 2022, 6:05am

This must have never been tested.
This is not very robust at all; no iostat= recovery and where is val and %s sensibly defined ?
Surely “f5.3” should be “fx.0” where x, %s and val are defined to cope with most expected number strings.

snano · July 11, 2022, 6:09am

thanks, @septc.

urbanjost · July 11, 2022, 11:48am

the only assumption is that the delimiter is spaces or commas. The example was showing that a Fortran read can act much like the C strtok function and can be used to split a string into tokens.

Rearranging it a bit to create a subroutine makes it simpler to see this finds the last two fields representing numeric values in any line if the values are space delimited.

program readfile
implicit none
integer,parameter          :: line_length=256
character(len=line_length) :: line
integer                    :: iostat,lun
real                       :: a, b
open (newunit=lun, file='line.dat', status='old')
FILEREAD: do
   read(lun,'(a)',IOSTAT=iostat)line  ! read a line into character variable
   if(iostat.ne.0) exit FILEREAD
   write(*,*)'LINE:',trim(line)
   call lasttwo(line,a,b)
   write(*,*)a,b
enddo FILEREAD
contains 
subroutine lasttwo(line,v1,v2)
character(len=*)         :: line
real,intent(out)         :: v1, v2
integer                  :: icount
character(len=len(line)) :: word
real                     :: a(line_length/2+1), aval
   icount=0                           ! initialize count of values found on line
   do
      line=adjustl(line)              ! remove leading spaces
      read(line,*,IOSTAT=iostat)word  ! read next token from line
      if (iostat.ne.0) exit
      read(word,*,IOSTAT=iostat) aval ! convert token to a number
      if (iostat.eq.0)  then
         a(icount+1)=aval        
         icount=icount+1
      endif
      line=line(len_trim(word)+1:)   ! remove token just read
   enddo
   if(icount.ge.2)then
      v1=a(icount-1)
      v2=a(icount)
   else
      write(*,'(1x,a)')'   WARNING: line did not contain at least two numeric values'
      v1=huge(0.0)
      v2=huge(0.0)
   endif  
end subroutine lasttwo
end program readfile

FortranFan · July 12, 2022, 5:51pm

Yeah, right.

I think @snano understood the illustration I provided toward a possible approach solution.

But for any other reader if they want to overlook the bigger picture and the outline but would rather sweat the details, first please note the following:

string manipulation toward such needs becomes much easier for users via a higher degree of abstraction, say a featured intrinsic string derived type (“class”) or a library solution toward such a type. Many modern and popular languages enable such an approach more natively than Fortran does currently and that is unfortunate. No wonder then OP and others in forum posts inquire re: Python, etc. and the convenient options with Fortran.
should 1) be of not interest, one can proceed with intrinsic character type and work with it. Toward this, TOKENIZE from Fortran 202X will become a good option once compiler support becomes a reality. In the meantime, one can roll one’s own procedure toward the same, as shown in this thread and elsewhere. It is not difficult.

Toward 1, I illustrated an approach but was unable to share further details. As to the mind-numbing minutiae brought up in the quoted comment here, again the message is users (say @snano) can take the illustration and adapt it as needed. What I showed was by no means meant to be prescriptive. As to “val”, “%s”, formats, etc., readers may know the anatomy of the approach is as follows:

module string_m
   type :: string_t
      private
      character(len=:), allocatable :: m_s
   contains
      private
      procedure, pass(this) :: assign_t
      procedure, pass(this), public :: s => get_s
      generic, public :: assignment(=) => assign_t
   end type
contains
   subroutine assign_t( this, s )
      class(string_t), intent(inout) :: this
      character(len=*), intent(in)   :: s
      this%m_s = s
   end subroutine
   function get_s( this ) result(r)
      class(string_t), intent(in) :: this
      character(len=:), allocatable :: r
      r = this%m_s
   end function 
end module
   use string_m, only : string_t
   type(string_t) :: dat
   real :: x
   dat = "4.567"  ! Simulate the loading of values as string tokens
   associate ( val => dat%s() )
      read( val, fmt="(f5.3)" ) x
      print *, x 
   end associate
end

With IFORT,

C:\Temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.6.0 Build 20220226_000000
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31329.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
 4.567000

Now, with the internal IO shown in this example, users would do well to consider suitable formats that can generally work with their data, or even list-directed read (fmt=*). Don’t get hung up on the shown format (F5.3), please.

And I believe error handling such as with iostat is best elided in illustrative examples, that is up to the users to employ to develop robust programs for their needs.

JohnCampbell · July 13, 2022, 2:44am

My concern with the suggested approaches is the use of repeated free format read to find the number of tokens or fields in the parsed line.
Wouldn’t it be simpler to read the line as say “character(len=128) :: line” then parse the “line” to find all the fields that exist ?
The question is then what are field delimiters, probably space " ", comma “,” and possibly “(” and “)”. Free formatted reads are limited in this respect.

From the example presented is how to interpret “1)” and “1.232 )” ?
Should the data be changed to “1 )”?

Once the immediate issue is solved, where do you stop. It is always a challenge to write a better line_parser !
There can be more flexibility in the possible delimiters, perhaps HT : = or ;. These all come with added complexity, such as reading HT (ascii 9) as READ can modify/mask its value. (Stream I/O may be required for this)

Then text fields enclosed in " or ’ where delimiters are to be ignored.

Then what is a valid real number ? must it include a decimal point ? Should it cope with European real syntax or thousands syntax ? And don’t use F5.3 to read an integer value should not be overlooked.
What is a real number can be a difficult question.

These problems can be better addressed in a Line_Parse subroutine, rather than read (line,*).

msz59 · July 13, 2022, 1:04pm

Another constraint is that the text line should not contain apostrophes/quotation marks as they would be interpreted as string quotations.

Separators other than space (comma, sometimes also semicolon) would make problems, too, as they would not get properly “trimmed” by the @urbanjost’s code. Consider a line like

a abc, 12.34 -23

The output of the program will be

   2.33999991      -23.0000000

septc · July 13, 2022, 1:53pm

I have updated my code snippet a bit (so as to initialize buf and vals also in case there are 0-length words or no reals found). For more robust parsing, I believe it is of course better to use a full-fledged parse routine that does not suffer historical / weird behaviors of Fortran list-directed input (such as / as a terminator…).

I remember having seen that newer Fortran standards already have split(), at least for specification? If it is already implemented in a compiler, I guess it will be definitely be better.

Personally, I do not care much about string facility in Fortran, because I will use other languages when the main task is parsing and conversion itself. I feel the string (= character array) facility in Fortran is x10 more convenient than that in C, but that in other recent languages is typically x10 more convenient than that in Fortran (so reducing the necessary coding time). So, I usually do any string manipulation as much as possible outside Fortran. (The only exception to this is namelist, which is really convenient! So I use it extensively.)

Here are some other strings for test; for this pattern, is the list-directed input still “safe”…?

line = “a abc, 12.34 -23”

number of words =            4
words in line:
len = 1  val = a
len = 3  val = abc
len = 5  val = 12.34
len = 3  val = -23
 last two reals in line:   12.3400002      -23.0000000

line = “a ; 4 b; 3, c , ; abc,12.34, g , -23;”

 number of words =           10
 words in line:
len = 1  val = a
len = 1  val = 4
len = 1  val = b
len = 1  val = 3
len = 1  val = c
len = 0  val = 
len = 3  val = abc
len = 5  val = 12.34
len = 1  val = g
len = 3  val = -23
 last two reals in line:   12.3400002      -23.0000000

Topic		Replies	Views
Reading data: format of variables Help	12	706	March 11, 2023
Reading variable length characters from file Help	9	2188	October 11, 2021
Reading values from text file of unknown structure Help	5	247	January 17, 2025
Reading lines from a file without padding	4	2163	March 31, 2022
Fortran how to read double precision data in input text file to real no output text file	5	2079	October 4, 2021

Best way to read only last two real values in a line using read in Fortran

Related topics