Best way to read only last two real values in a line using read in Fortran

I am a bit puzzled by the solutions that have been proposed, mainly because I don’t see a definition of what “fields” can be in the line and how the fields can be interpreted.

I presume a field delimiter could be one or multiple blanks ( or possibly “=” or a comma or a numeric operator + - * / or ^ ?)

A real value must be numeric ( and contain a “.” ? ) ( “1e3” or “1d3” can be ambiguous )

Many years ago, I wrote a parser for “free format” input, ( based on blanks or comma delimiters ) then being lazy, I tried to test each field as numeric by reading it for a real value as in

character field*20 
...
read (field, fmt='(f20.0)', iostat=iostat) val " 
if ( iostat == 0 ) then
!  have a numeric value

This worked well on the compilers I was using at the time, until I tried an IBM Fortran 77 compiler where iostat=iostat would not report a numeric conversion error, but just crash !
The standard apparently states that the response for iostat= is “processor dependent” !
That experience has scared me for life, as I was not expecting such a weak interpretation of the standard.
There are helpful/good compilers and there are unhelpful/poor compilers.

I would say first that the problem has not been clearly specified. For example, is the field right before the two real numbers always ‘unit2’, or is it supposed to be some arbitrary field? If it is always ‘unit2’, then the simplest solution has already been posted by septc.

On the other extreme, is you don’t know how many items there are or their values, then the simplest approach is probably to read the entire line (in which case you must know the maximum line length ahead of time), and then scan that line backwards to extract the last two items. Actually, in your example, it looks like you need the last THREE items, two of which are the real numbers you want plus the trailing ‘)’. To do that scan, you must know what delimits the items. Is it just spaces, or can it include combinations of spaces and commas, following standard fortran list-directed i/o conventions? Or can it also include tab characters or other application-specific delimiters, such as ‘=()/’?

Then, it gets even more complicated if you are trying to mimic the full list-directed i/o conventions of repeated commas and repetition counts.

All of these can be done with the intrinsic string processing operations within fortran. But you need to actually specify the problem fully to know which approach to take. Compared to other languages and utilities, for example like perl or awk, the only important missing feature is regular expression parsing. That could actually be used in this example to advantage if it were available, but it would not simplify all the possible cases (e.g. handling repeated commas and repetition counts is still difficult).

@snano,

I think what you are referring to is similar to a discussion a while ago at comp.lang.fortran, see this link:
https://groups.google.com/g/comp.lang.fortran/c/UFGS4c7UGKg/m/moGz41wLAQAJ

What you will notice is there are currently several “string” library solutions out there that you can evaluate. StringiFor is one you can immediately consider:

Most or all of them include procedures similar to the “split method” in Python. If you don’t like the available solutions out there, you can roll your own also.

Another option is to wait X years where X can be anywhere from 1 to >15-20 years or forever depending on the compiler of your choice where you make use of the TOKENIZE intrinsic introduced with Fortran 202X. This intrinsic can enable what you are looking for.

I don’t have any copyright to a library solution I had helped a team develop to be able to post the working code here but I was able to temporarily gain access to it and try it out enough to illustrate what I think is the need here. The code was along the following lines:

   use string_m, only : string_t

   type(string_t) :: dat
   character(len=1), allocatable :: delimiters(:)
   type(string_t), allocatable :: tokens(:)
   real :: x, y

   ! assume the line from the 'line.dat' file has been read into a CHARACTER variable
   ! simulated here as a literal constant
   dat = "( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )"
   delimiters = [ "(", " ", "=", ")" ]
   call dat%split( delimiters, tokens )
   ! error handling elided

   associate( val => tokens(size(tokens)-1)%s() )
      read( val, "(f5.3)" ) x
      print *, x
   end associate

   associate( val => tokens(size(tokens))%s() )
      read( val, "(f5.3)" ) y
      print *, y
   end associate

end

The program output was

 4.567000
 1.232000
Press any key to continue . . .

You will notice the above library solution uses an array of delimiters to split a string into an array of tokens. It then takes the last two and performs a Fortran internal IO on the string representation to read them.into floating-point objects.

I think this is what you are looking for, Fortran nearly has it. Too bad and a shame it is not intrinsic yet in standard Fortran and there is no intrinsic (built-in) string type to make such string manipulation a breeze for the users.

1 Like

I would say first that the problem has not been clearly specified. For example, is the field right before the two real numbers always ‘unit2’, or is it supposed to be some arbitrary field? If it is always ‘unit2’, then the simplest solution has already been posted by septc.

unit1 and unit2 are units of the quantities. But for discussion here, I suppose them to be as such i.e. “unit2”. Also, I have “)” at the end of the line. So, suggested solution may not work in that case (but not a general solution if unit2 is variable)

To do that scan, you must know what delimits the items. Is it just spaces, or can it include combinations of spaces and commas, following standard fortran list-directed i/o conventions? Or can it also include tab characters or other application-specific delimiters, such as ‘=()/’?

In the problem, clearly, the delimiters are spaces (not tabs).

thanks, @FortranFan. Yes, it looks cool.

In-addition, I also checked here as well (see the script below): https://groups.google.com/g/comp.lang.fortran/c/UFGS4c7UGKg/m/moGz41wLAQAJ. Seems flibs and StringIFor can do that, but I had some issues installing them on my machine (May be I will do it again.)

Also, I tried this approach to convert the line.dat to arrays of strings (see the below code), then I could not extract the last two reals:

module mod_string
    implicit none
    
    contains
 
    subroutine split2array(string, delimiter) 
       character(100), intent(in) :: string
       character, intent(in) :: delimiter ! ';'
    
       character(len=80), allocatable :: strarray(:)
       integer :: n, m 
       integer :: i, idx
       character(len=80):: strtmp
       strtmp = trim(adjustl(string))
       n = count([(strtmp(i:i), i=1, len_trim(strtmp))] == delimiter)
       allocate (strarray(n + 1))
    
       m = 1
       do i = 1, n
          idx = index(strtmp(m:), delimiter)
          strarray(i) = adjustl(strtmp(m:m + idx - 2))
          m = m + idx
       end do
       strarray(n + 1) = adjustl(strtmp(m:))
       print *, 'nvalues or string element are =', n + 1
       print '(*(a))', strarray(1:n+1) 
    end subroutine split2array
 end module mod_string
 
 
 program test
    use mod_string, only : split2array
    implicit none
    character(len=100) :: string
    character :: delimiter
 
    string = "( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )"
    delimiter = " "
    call split2array(string, delimiter) 
 end program test
 !output:
! nvalues or string element are =          15
!(                                                                               v=                                                                                                                                                              1)                                                                              w                                                                               =                                                                               2.843                                                                           unit1                                                                           =                                                                               5.867                                                                           unit2                                                                                                                                                           4.567                                                                           1.232                                                                           )  

Now the task is: out of this array of strings, take the last 2 reals? Nonetheless, the solution suggested by @FortranFan is ok.

I imagined that there are already a lot of split routines available from other libraries, but if a minimal / simple (stand-alone) routine is desired, it may be convenient to prepare it in a local library, for example…

module strmod
    implicit none
contains

subroutine split( line, words, nw )
    character(*), intent(in)  :: line
    character(*), intent(out) :: words(:)
    integer,      intent(out) :: nw
    character(len(words)) :: buf( size(words) )
    integer :: k, ios

    nw = 0 ; words(:) = ""

    do k = 1, size(words)
        buf( 1 : k ) = ""
        read( line, *, iostat=ios ) buf( 1 : k )
        if ( ios /= 0 ) exit
        nw = k
        words( 1 : nw ) = buf( 1 : nw )
    enddo
end

subroutine get_last_reals( words, vals )
    character(*), intent(in)  :: words(:)
    real,         intent(out) :: vals(:)
    integer :: k, ios, ir
    real :: x

    ir = size(vals)
    vals(:) = -99999   !! dummy value for "not found" (NaN may be better)

    do k = size(words), 1, -1
        read( words( k ), *, iostat=ios ) x
        if ( ios /= 0 ) cycle
        vals( ir ) = x
        ir = ir - 1
        if ( ir == 0 ) exit
    enddo
end

end module

program main
    use strmod
    implicit none
    character(80) :: line, words( 50 )
    integer :: nw, k
    real :: vals( 2 )

    line = "( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )"
    !! line = "a abc, 12.34 -23"
    !! line = "a; 4 b; 3, c,; abc,12.34, g, -23;"

    call split( line, words, nw )

    print *, "line = ", trim(line)
    print *, "number of words = ", nw
    print *, "words in line:"
    do k = 1, nw
        print "('len = ',i0,'  val = ',a)", len_trim( words(k) ), trim( words(k) )
    enddo

    call get_last_reals( words, vals )

    print *, "last two reals in line:", vals
end
$ gfortran-10 test.f90 && ./a.out
 line = ( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )
 number of words =           13
 words in line:
len = 1  val = (
len = 2  val = v=
len = 2  val = 1)
len = 1  val = w
len = 1  val = =
len = 5  val = 2.843
len = 5  val = unit1
len = 1  val = =
len = 5  val = 5.867
len = 5  val = unit2
len = 5  val = 4.567
len = 5  val = 1.232
len = 1  val = )
 last two reals in line:   4.56699991       1.23199999 

(This Stackoverflow Q/A may also be related)

But, if the line has something like “123.45)” (no space between reals and parentheses), this kind of method may just fail. In that case, I guess it may be necessary to first remove parenthesis from the line (e.g. by using a replace() like routine based on index()). If the line has a trailing comment that begins with # or ! etc, that part may also be needed to be dropped from the line first, I guess… (again, e.g. via index()).

1 Like

This must have never been tested.
This is not very robust at all; no iostat= recovery and where is val and %s sensibly defined ?
Surely “f5.3” should be “fx.0” where x, %s and val are defined to cope with most expected number strings.

thanks, @septc.

1 Like

the only assumption is that the delimiter is spaces or commas. The example was showing that a Fortran read can act much like the C strtok function and can be used to split a string into tokens.

Rearranging it a bit to create a subroutine makes it simpler to see this finds the last two fields representing numeric values in any line if the values are space delimited.

program readfile
implicit none
integer,parameter          :: line_length=256
character(len=line_length) :: line
integer                    :: iostat,lun
real                       :: a, b
open (newunit=lun, file='line.dat', status='old')
FILEREAD: do
   read(lun,'(a)',IOSTAT=iostat)line  ! read a line into character variable
   if(iostat.ne.0) exit FILEREAD
   write(*,*)'LINE:',trim(line)
   call lasttwo(line,a,b)
   write(*,*)a,b
enddo FILEREAD
contains 
subroutine lasttwo(line,v1,v2)
character(len=*)         :: line
real,intent(out)         :: v1, v2
integer                  :: icount
character(len=len(line)) :: word
real                     :: a(line_length/2+1), aval
   icount=0                           ! initialize count of values found on line
   do
      line=adjustl(line)              ! remove leading spaces
      read(line,*,IOSTAT=iostat)word  ! read next token from line
      if (iostat.ne.0) exit
      read(word,*,IOSTAT=iostat) aval ! convert token to a number
      if (iostat.eq.0)  then
         a(icount+1)=aval        
         icount=icount+1
      endif
      line=line(len_trim(word)+1:)   ! remove token just read
   enddo
   if(icount.ge.2)then
      v1=a(icount-1)
      v2=a(icount)
   else
      write(*,'(1x,a)')'   WARNING: line did not contain at least two numeric values'
      v1=huge(0.0)
      v2=huge(0.0)
   endif  
end subroutine lasttwo
end program readfile
1 Like

Yeah, right.

I think @snano understood the illustration I provided toward a possible approach solution.

But for any other reader if they want to overlook the bigger picture and the outline but would rather sweat the details, first please note the following:

  1. string manipulation toward such needs becomes much easier for users via a higher degree of abstraction, say a featured intrinsic string derived type (“class”) or a library solution toward such a type. Many modern and popular languages enable such an approach more natively than Fortran does currently and that is unfortunate. No wonder then OP and others in forum posts inquire re: Python, etc. and the convenient options with Fortran.
  2. should 1) be of not interest, one can proceed with intrinsic character type and work with it. Toward this, TOKENIZE from Fortran 202X will become a good option once compiler support becomes a reality. In the meantime, one can roll one’s own procedure toward the same, as shown in this thread and elsewhere. It is not difficult.

Toward 1, I illustrated an approach but was unable to share further details. As to the mind-numbing minutiae brought up in the quoted comment here, again the message is users (say @snano) can take the illustration and adapt it as needed. What I showed was by no means meant to be prescriptive. As to “val”, “%s”, formats, etc., readers may know the anatomy of the approach is as follows:

module string_m
   type :: string_t
      private
      character(len=:), allocatable :: m_s
   contains
      private
      procedure, pass(this) :: assign_t
      procedure, pass(this), public :: s => get_s
      generic, public :: assignment(=) => assign_t
   end type
contains
   subroutine assign_t( this, s )
      class(string_t), intent(inout) :: this
      character(len=*), intent(in)   :: s
      this%m_s = s
   end subroutine
   function get_s( this ) result(r)
      class(string_t), intent(in) :: this
      character(len=:), allocatable :: r
      r = this%m_s
   end function 
end module
   use string_m, only : string_t
   type(string_t) :: dat
   real :: x
   dat = "4.567"  ! Simulate the loading of values as string tokens
   associate ( val => dat%s() )
      read( val, fmt="(f5.3)" ) x
      print *, x 
   end associate
end

With IFORT,

C:\Temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.6.0 Build 20220226_000000
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31329.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
 4.567000

Now, with the internal IO shown in this example, users would do well to consider suitable formats that can generally work with their data, or even list-directed read (fmt=*). Don’t get hung up on the shown format (F5.3), please.

And I believe error handling such as with iostat is best elided in illustrative examples, that is up to the users to employ to develop robust programs for their needs.

My concern with the suggested approaches is the use of repeated free format read to find the number of tokens or fields in the parsed line.
Wouldn’t it be simpler to read the line as say “character(len=128) :: line” then parse the “line” to find all the fields that exist ?
The question is then what are field delimiters, probably space " ", comma “,” and possibly “(” and “)”. Free formatted reads are limited in this respect.

From the example presented is how to interpret “1)” and “1.232 )” ?
Should the data be changed to “1 )”?

Once the immediate issue is solved, where do you stop. It is always a challenge to write a better line_parser !
There can be more flexibility in the possible delimiters, perhaps HT : = or ;. These all come with added complexity, such as reading HT (ascii 9) as READ can modify/mask its value. (Stream I/O may be required for this)

Then text fields enclosed in " or ’ where delimiters are to be ignored.

Then what is a valid real number ? must it include a decimal point ? Should it cope with European real syntax or thousands syntax ? And don’t use F5.3 to read an integer value should not be overlooked.
What is a real number can be a difficult question.

These problems can be better addressed in a Line_Parse subroutine, rather than read (line,*).

Another constraint is that the text line should not contain apostrophes/quotation marks as they would be interpreted as string quotations.

Separators other than space (comma, sometimes also semicolon) would make problems, too, as they would not get properly “trimmed” by the @urbanjost’s code. Consider a line like

a abc, 12.34 -23

The output of the program will be

   2.33999991      -23.0000000    

I have updated my code snippet a bit (so as to initialize buf and vals also in case there are 0-length words or no reals found). For more robust parsing, I believe it is of course better to use a full-fledged parse routine that does not suffer historical / weird behaviors of Fortran list-directed input (such as / as a terminator…).

I remember having seen that newer Fortran standards already have split(), at least for specification? If it is already implemented in a compiler, I guess it will be definitely be better.

Personally, I do not care much about string facility in Fortran, because I will use other languages when the main task is parsing and conversion itself. I feel the string (= character array) facility in Fortran is x10 more convenient than that in C, but that in other recent languages is typically x10 more convenient than that in Fortran (so reducing the necessary coding time). So, I usually do any string manipulation as much as possible outside Fortran. (The only exception to this is namelist, which is really convenient! So I use it extensively.)


Here are some other strings for test; for this pattern, is the list-directed input still “safe”…?

line = “a abc, 12.34 -23”

number of words =            4
words in line:
len = 1  val = a
len = 3  val = abc
len = 5  val = 12.34
len = 3  val = -23
 last two reals in line:   12.3400002      -23.0000000

line = “a ; 4 b; 3, c , ; abc,12.34, g , -23;”

 number of words =           10
 words in line:
len = 1  val = a
len = 1  val = 4
len = 1  val = b
len = 1  val = 3
len = 1  val = c
len = 0  val = 
len = 3  val = abc
len = 5  val = 12.34
len = 1  val = g
len = 3  val = -23
 last two reals in line:   12.3400002      -23.0000000 

FYI: To keep the example simpler I removed a DO loop that stepped through and replaced all the obviously non-numeric characters with an X which eliminates the issues for ",',*,/.

Here is an attempt, by identifying the tokens without “read(line,*) token”. It uses two simple DO loop scans of the line string.
It first replaces all other delimiters with a blank then searches for the start and end of all tokens. The list of possible delimiters can always change !
It then identifies the start and end of all tokens, testing if the Fortran READ accepts the token as a real number.

This approach does not include some useful free format extensions, such as:

  • the use of , , to identify null tokens
  • the use of arithmentc operators + - / * ^ for sumple numeric calculations.
  • coping with ascii HT char(9)
      character :: line*128
      character :: blank        = ' '
      character :: delimiters*5 = ' ,()='
      character :: list_of_tokens(64)*32
      real      :: list_of_reals(64), val
      integer   :: nt, nr, fc, k, iostat

      line = ' ( v=  1) w = 2.843 unit1 = 5.867 unit2  4.567 1.232 )'
      write ( *,10) 'Initial line :', trim(line)

!   replace other delimiters with blank
      do k = 1, len_trim(line)
        if ( index (delimiters, line(k:k)) > 0 ) line(k:k) = blank
      end do
      write ( *,10) 'Without delim:', trim(line)

!   find all tokens in filtered line
      nt = 0    ! number of tokens
      nr = 0    ! number of reals
      fc = 0    ! first character of next token

      write ( *,10) 'Search for Tokens'
      do k = 1, len_trim(line)+1                ! line must end with a blank

        if ( fc == 0 ) then
          if ( line(k:k) /= blank ) fc = k      ! have start of toktn

        else if ( line(k:k) == blank ) then     ! have end of token
          nt = nt+1
          list_of_tokens(nt) = line(fc:k)       ! add token to list
          read (list_of_tokens(nt), fmt='(f20.0)', iostat=iostat ) val
          if ( iostat == 0 ) then
            nr = nr+1
            list_of_reals(nr) = val             ! add reaL to list
            write ( *,11) nt, trim(list_of_tokens(nt)), '   val =',val
          else
            write ( *,11) nt, trim(list_of_tokens(nt))
          end if
          fc = 0    ! set for start of next token
        end if
      end do

      write ( *,10) 'Summary'
      write ( *,11) nt,'tokens identified'
      write ( *,11) nr,'real values identified'

 10   format (/a,a)
 11   format ( i4,2x,a,a,g10.4 )
      end
1 Like

My 2c:

  • the length in f descriptor in
    read (list_of_tokens(nt), fmt='(f20.0)', iostat=iostat ) val
    should match the length of list_of_tokens(nt) (now set to 32), otherwise a token like 1234567890123456789012.34 will get truncated not only in terms of significant digits but also in the order of magnitude:
    4 1234567890123456789012.843 val =0.1235E+20 (should be 0.1235E+22)
    this could be fixed by using dynamic format
character ::  valfmt*10
! [...]
write(valfmt,"('(F',I0,'.0)')"), len(list_of_tokens(1))
! [...]
read (list_of_tokens(nt), fmt=valfmt, iostat=iostat ) val
  • based on my experience with gfortran (see also here) I would add semicolon to the list of delimiters, possibly replacing space which is redundant there.
  • @JohnCampbell’s code would be better readable if started with ```fortran. As it is now, the language has not been recognized properly by the server

@msz59 Thanks for your comments.

I do agree that there is a problem if real number strings longer than 20 digits are provided but I am not convinced by the solution. A Warning probably should be provided (we are using a 64-bit real)

You could add ; or : or / (date/time) or ~ to the list, but these can be specific delimiter cases.
In the code example, the first delimiter is blank, which replaces all other delimiters supported. This implies repeated delimiters are merged to a single delimiter. Use of multiple ;;; might not be interpreted this way.

For the data formats I use, comma and HT are interpreted differently as when repeated they do indicate significant null token fields. My simplified code is ambiguous for these special types.

Should ```fortran be manually applied after selecting the </> icon ?
Thanks for the suggestion.

I don’t get this. Why not use the same length for the token string and format? F20.0 is not enough for a 64-bit real anyway. The default list-formatted output of such a value is 1.2345678901234568E+021 (23 characters)

This would make sense if you wrote

if ( index (delimiters, line(k:k)) > 1 ) line(k:k) = delimiters(1:1)

As it is now, it just blindly replaces any delimiter (including space) by a space. Semicolon is a legal fields separator in case when DECIMAL='COMMA' is activated. F2018 standard says:

13.6.2 If the decimal edit mode is COMMA during list-directed input/output, the character used as a value separator is a semicolon in place of a comma.

gfortran recognizes a semicolon as a separator even if DECIMAL has the default value of POINT (by an extension or error :slight_smile: )

I never use that icon, typing ``` by hand but as I see now, clicking </> produces a pair of
triple back-quote sequences, so you would have to add fortran to the first of those. See the short discussion here. I suggested making fortran the default language on this particular discourse but maybe it is not (easily?) doable.

I’m old and old-fashioned but I think that this stuff is tedious and dumb

implicit none
real :: x, y
character(len=10) :: char1

If you know what column this is in, use T format and skip the stuff above

msz59 will be glad to know I reported the gfortran bug with semicolon separators a few weeks ago but I don’t think the bug-fix has yet been incorporated into the compiler. Also even 23 spaces for a real number was nowhere near enough recently when I wanted to see how two different algorithms compared when working in quad precision.