FYI: To keep the example simpler I removed a DO loop that stepped through and replaced all the obviously non-numeric characters with an X which eliminates the issues for ",',*,/.
Here is an attempt, by identifying the tokens without “read(line,*) token”. It uses two simple DO loop scans of the line string.
It first replaces all other delimiters with a blank then searches for the start and end of all tokens. The list of possible delimiters can always change !
It then identifies the start and end of all tokens, testing if the Fortran READ accepts the token as a real number.
This approach does not include some useful free format extensions, such as:
- the use of , , to identify null tokens
- the use of arithmentc operators + - / * ^ for sumple numeric calculations.
- coping with ascii HT char(9)
character :: line*128
character :: blank = ' '
character :: delimiters*5 = ' ,()='
character :: list_of_tokens(64)*32
real :: list_of_reals(64), val
integer :: nt, nr, fc, k, iostat
line = ' ( v= 1) w = 2.843 unit1 = 5.867 unit2 4.567 1.232 )'
write ( *,10) 'Initial line :', trim(line)
! replace other delimiters with blank
do k = 1, len_trim(line)
if ( index (delimiters, line(k:k)) > 0 ) line(k:k) = blank
end do
write ( *,10) 'Without delim:', trim(line)
! find all tokens in filtered line
nt = 0 ! number of tokens
nr = 0 ! number of reals
fc = 0 ! first character of next token
write ( *,10) 'Search for Tokens'
do k = 1, len_trim(line)+1 ! line must end with a blank
if ( fc == 0 ) then
if ( line(k:k) /= blank ) fc = k ! have start of toktn
else if ( line(k:k) == blank ) then ! have end of token
nt = nt+1
list_of_tokens(nt) = line(fc:k) ! add token to list
read (list_of_tokens(nt), fmt='(f20.0)', iostat=iostat ) val
if ( iostat == 0 ) then
nr = nr+1
list_of_reals(nr) = val ! add reaL to list
write ( *,11) nt, trim(list_of_tokens(nt)), ' val =',val
else
write ( *,11) nt, trim(list_of_tokens(nt))
end if
fc = 0 ! set for start of next token
end if
end do
write ( *,10) 'Summary'
write ( *,11) nt,'tokens identified'
write ( *,11) nr,'real values identified'
10 format (/a,a)
11 format ( i4,2x,a,a,g10.4 )
end
My 2c:
- the length in
f
descriptor in
read (list_of_tokens(nt), fmt='(f20.0)', iostat=iostat ) val
should match the length oflist_of_tokens(nt)
(now set to 32), otherwise a token like1234567890123456789012.34
will get truncated not only in terms of significant digits but also in the order of magnitude:
4 1234567890123456789012.843 val =0.1235E+20
(should be 0.1235E+22)
this could be fixed by using dynamic format
character :: valfmt*10
! [...]
write(valfmt,"('(F',I0,'.0)')"), len(list_of_tokens(1))
! [...]
read (list_of_tokens(nt), fmt=valfmt, iostat=iostat ) val
- based on my experience with
gfortran
(see also here) I would add semicolon to the list of delimiters, possibly replacing space which is redundant there. - @JohnCampbell’s code would be better readable if started with ```fortran. As it is now, the language has not been recognized properly by the server
@msz59 Thanks for your comments.
I do agree that there is a problem if real number strings longer than 20 digits are provided but I am not convinced by the solution. A Warning probably should be provided (we are using a 64-bit real)
You could add ; or : or / (date/time) or ~ to the list, but these can be specific delimiter cases.
In the code example, the first delimiter is blank, which replaces all other delimiters supported. This implies repeated delimiters are merged to a single delimiter. Use of multiple ;;; might not be interpreted this way.
For the data formats I use, comma and HT are interpreted differently as when repeated they do indicate significant null token fields. My simplified code is ambiguous for these special types.
Should ```fortran be manually applied after selecting the </> icon ?
Thanks for the suggestion.
I don’t get this. Why not use the same length for the token string and format? F20.0 is not enough for a 64-bit real anyway. The default list-formatted output of such a value is 1.2345678901234568E+021
(23 characters)
This would make sense if you wrote
if ( index (delimiters, line(k:k)) > 1 ) line(k:k) = delimiters(1:1)
As it is now, it just blindly replaces any delimiter (including space) by a space. Semicolon is a legal fields separator in case when DECIMAL='COMMA'
is activated. F2018 standard says:
13.6.2 If the decimal edit mode is COMMA during list-directed input/output, the character used as a value separator is a semicolon in place of a comma.
gfortran
recognizes a semicolon as a separator even if DECIMAL
has the default value of POINT
(by an extension or error )
I never use that icon, typing ``` by hand but as I see now, clicking </> produces a pair of
triple back-quote sequences, so you would have to add fortran
to the first of those. See the short discussion here. I suggested making fortran
the default language on this particular discourse but maybe it is not (easily?) doable.
I’m old and old-fashioned but I think that this stuff is tedious and dumb
implicit none
real :: x, y
character(len=10) :: char1
If you know what column this is in, use T format and skip the stuff above
msz59 will be glad to know I reported the gfortran bug with semicolon separators a few weeks ago but I don’t think the bug-fix has yet been incorporated into the compiler. Also even 23 spaces for a real number was nowhere near enough recently when I wanted to see how two different algorithms compared when working in quad precision.