Good news, I forgot to pass --profile release
… Correct times are:
formatted read : 0.7430 seconds
my dummy : 0.0344 seconds
My dummy is still 21 times faster.
Good news, I forgot to pass --profile release
… Correct times are:
formatted read : 0.7430 seconds
my dummy : 0.0344 seconds
My dummy is still 21 times faster.
Calling the function you posted with
program xstr_real
implicit none
integer, parameter :: wp = kind(1.0d0)
print*,str2real("0.12"),str2real("1.2"),str2real("12.0")
contains
function str2real(s) result(r)
character(*), target, intent(in) :: s
real(wp) :: r
integer*1, parameter :: period = -2
integer*1, parameter :: char_e = 21
integer, parameter :: N = 32
real(wp), parameter :: expo(*) = [1e15, 1e14, 1e13, 1e12, 1e11, 1e10, 1e9, 1e8, &
1e7, 1e6, 1e5, 1e4, 1e3, 1e2, 1e1, 1e0, &
1e0, &
1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8, &
1e-9, 1e-10, 1e-11, 1e-12, 1e-13, 1e-14, 1e-15, 1e-16, &
1e-17, 1e-18, 1e-19, 1e-20, 1e-21, 1e-22, 1e-23, 1e-24]
character(N) :: equ_s
integer*1 :: equ_i(N)
integer*1 :: mask(N)
integer :: period_loc
integer :: exponent_loc
integer :: mask_from
integer :: mask_till
equivalence(equ_i, equ_s)
equ_s = s
equ_i = equ_i - 48
period_loc = findloc(equ_i, period, 1)
exponent_loc = findloc(equ_i, char_e, 1)
where (0 <= equ_i .and. equ_i <= 9)
mask = 1
elsewhere
mask = 0
end where
mask_from = 18 - period_loc
mask_till = mask_from + exponent_loc - 2
r = sum(equ_i(:exponent_loc - 1) * expo(mask_from:mask_till) * mask(:exponent_loc - 1))
end function str2real
end program xstr_real
I get output
0.0000000000000000 0.0000000000000000 0.0000000000000000
Am I calling it incorrectly?
I only tested it with a trailing E
. If you call the routine with "0.12E"
and so on it should work. Everything after the E
still gets ignored.
Thank you for providing good testcases!
Sure. For the code
program xstr_real
implicit none
integer, parameter :: wp = kind(1.0d0), ikind = selected_int_kind(2)
print*,str2real("-0.12E"),str2real("0.12E"),str2real("1.2E"),str2real("12.0E")
contains
function str2real(s) result(r)
character(*), target, intent(in) :: s
real(wp) :: r
integer(kind=ikind), parameter :: period = -2
integer(kind=ikind), parameter :: char_e = 21
integer, parameter :: N = 32
real(wp), parameter :: expo(*) = [1e15, 1e14, 1e13, 1e12, 1e11, 1e10, 1e9, 1e8, &
1e7, 1e6, 1e5, 1e4, 1e3, 1e2, 1e1, 1e0, &
1e0, &
1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8, &
1e-9, 1e-10, 1e-11, 1e-12, 1e-13, 1e-14, 1e-15, 1e-16, &
1e-17, 1e-18, 1e-19, 1e-20, 1e-21, 1e-22, 1e-23, 1e-24]
character(N) :: equ_s
integer(kind=ikind) :: equ_i(N)
integer(kind=ikind) :: mask(N)
integer :: period_loc
integer :: exponent_loc
integer :: mask_from
integer :: mask_till
equivalence(equ_i, equ_s)
equ_s = s
equ_i = equ_i - 48
period_loc = findloc(equ_i, period, 1)
exponent_loc = findloc(equ_i, char_e, 1)
where (0 <= equ_i .and. equ_i <= 9)
mask = 1
elsewhere
mask = 0
end where
mask_from = 18 - period_loc
mask_till = mask_from + exponent_loc - 2
r = sum(equ_i(:exponent_loc - 1) * expo(mask_from:mask_till) * mask(:exponent_loc - 1))
end function str2real
end program xstr_real
compiling with gfortran xstr_real.f90
and running gives
0.12000000104308128 0.12000000104308128 1.2000000029802322 12.000000000000000
so a leading minus sign needs to be handled. I changed the nonstandard integer*1
declarations as shown. Compiling with gfortran -std=f2018 xstr_real.f90
gives
xstr_real.f90:27:25:
27 | equivalence(equ_i, equ_s)
| 1
Warning: Fortran 2018 obsolescent feature: EQUIVALENCE statement at (1)
xstr_real.f90:27:12:
27 | equivalence(equ_i, equ_s)
| 1
Error: GNU Extension: Non-default type object or sequence equ_i in EQUIVALENCE statement at (1) with objects of different type
I don’t know how equivalence
works. Can the code be rewritten without it or with it but in a standard-conforming way?
equivalence
makes the character
variable available as an integer
. Basically it tells the compiler to put both variables into the same space in memory. I don’t know how to do this in another way, maybe with pointers?
Thank you for the other tests and improvements, I’ll have a look at them tomorrow.
Well, I did not get very far myself with vector ops. I was trying to use the “Zen of Fortran” with some lines I added and one of the interesting issues to me was that using three different compilers a change I would make would often speed up one and sometimes slow another to well below the benchmark speed of the internal READ. Here is one descended from the original post I made that speeds up gfortran, ifort, and nvfortran to varying degrees and has only big issue in that something like “1111111111111111111111111111111111111111111111111111111111111111111111111111111111” is handled incorrectly. Reporting errors on input cost quite a bit but I still saw significant speed-ups. For reference (Warts and all for now; but I am going to try to merge in some of your features if I get the time):
module m_ascii
private
public ator
contains
logical function ator(str,val)
use iso_fortran_env, only: wp => real64, ip => int64
implicit none
! Convert ASCII-text to DP and return .TRUE. if OK
character(len=*),intent(in) :: str
real(kind=wp) val
integer(kind=ip) :: value(3), sval(3),digits(3)
integer(kind=ip) :: count(5)
integer :: i, part, a, ipos
integer,parameter :: uech=ichar('E'), lech=ichar('e'), udee=ichar('D'), ldee=ichar('d')
integer,parameter :: plsign=ichar('+'), mnsign=ichar('-'), decim=ichar('.'), aspace=ichar(' ')
integer,parameter :: u0=ichar('0'), u1=ichar('1'), u2=ichar('2'), u3=ichar('3'), u4=ichar('4')
integer,parameter :: u5=ichar('5'), u6=ichar('6'), u7=ichar('7'), u8=ichar('8'), u9=ichar('9')
value=0
count=0
digits=0
ipos=0
ator = .false.
sval = [1,0,1]
part = 1
!not using integers slow in ifort
!do i=1,len_trim(str)
! a(i)=ichar(str(i:i))
!enddo
!a=transfer(str,1,size(a)) ! fast with gfortran, but not ifort or nvfortran
!a=ichar([(str(i:i),i=1,size(a))]) ! incredibly slow with ifort
do i = 1, len(str)
a=ichar(str(i:i))
ipos=ipos+1
select case(a)
case(u0:u9) ! if too many digits switch to real, ignore, or error
value(part) = value(part)*10 + a-u0
digits(part) = digits(part) + 1
case(decim) ! if more than once should report error
part = 2
count(1)=count(1)+1
case(uech,lech,udee,ldee) ! if more than once should report error
part = 3
count(2)=count(2)+1
ipos=0
case(mnsign) ! sign in non-standard position or duplicated should report error
sval(part) = sval(part)*(-1)
if(ipos.ne.1)count(3)=count(3)+len(str)+2
count(3)=count(3)+1
case(plsign)
sval(part) = sval(part)*1
if(ipos.ne.1)count(4)=count(4)+1
count(4)=count(4)+1
case(aspace) ! should possibly not ignore all internal spaces
ipos=ipos-1
case default
value(part) = 0
count(5)=99999
!return
end select
enddo
! is no value after E an error?
associate ( whole=>value(1), fractional=>value(2), exp=>value(3), sgn=>sval(1), sexp=>sval(3) )
val = sign(real(whole,kind=wp) + real(fractional,kind=wp)/10**digits(2),real(sgn,kind=wp))* (10.0_wp**(exp*sexp))
!!write(*,'(*(g0,1x))')'value=',value,' sval=',sval,' digits=',digits,' string=',str,val
end associate
if(all(count.le.1).and.ipos.ne.0)then
ator = .true.
else
ator = .false.
endif
contains
subroutine digit()
end subroutine digit
end function ator
end module m_ascii
program main
use M_ascii, only : ator
use iso_fortran_env, only: wp => real64, ip => int64
implicit none
integer,parameter :: n = 1000000 !! number of values
real(wp) :: rval_ator(n), rval_read(n), rval(n)
real(wp) :: val
integer :: ierr, i
integer(kind=ip) :: start, finish, count_rate
logical :: lerr, ret
character(len=30),allocatable :: strings(:)
character(len=:),allocatable :: tests(:)
! create a list of values to parse
allocate(strings(n))
do i = 1, n
call random_number(rval(i))
write(strings(i), '(g0)') rval(i)
enddo
! use internal read
call system_clock(start, count_rate)
do i = 1, n
read(strings(i),fmt=*,iostat=ierr) rval_read(i)
enddo
call system_clock(finish)
write(*,'(a30,1x,f7.4,1x,a)') 'time * : ', (finish-start)/real(count_rate,wp), ' seconds'
! use ator()
call system_clock(start)
do i = 1, n
lerr=ator(strings(i),rval_ator(i))
enddo
call system_clock(finish)
write(*,'(a30,1x,f7.4,1x,a)') 'time * : ', (finish-start)/real(count_rate,wp), ' seconds'
! compare results
do i = 1, n
if(abs(rval_ator(i)-rval_read(i)).gt.1*epsilon(rval_ator(i))) then
write(*,'(i10,1x,*(g0,1x))')i,rval_ator(i)-rval_read(i),rval_ator(i),rval_read(i),strings(i)
endif
enddo
!do i = 1, n
! if(rval(i)-rval_read(i).ne.0.0_wp) then
! write(*,'(i10,1x,*(g0,1x))')i,rval_ator(i)-rval_read(i),rval_ator(i),rval_read(i),strings(i)
! endif
!enddo
!do i = 1, n
! if(rval(i)-rval_ator(i).ne.0.0_wp) then
! write(*,'(i10,1x,*(g0,1x))')i,rval_ator(i)-rval_read(i),rval_ator(i),rval_read(i),strings(i)
! endif
!enddo
tests=[character(len=80) :: &
! returns 0
'', &
! returns 0
'E', &
! does not show overflow error
'1111111111111111111111111111111111111111111111111111111111111111111111111111111111111', &
'1', &
'-1', &
'-1e3', &
'+1234567890.123456789e1', &
'+1234567890.123456789e100', &
'1234567890.123456789e1', &
'1234567890.123456789e-1', &
'123456-7890.123456789e1', &
'--1234567890.123456789e1', &
'+-1234567890.123456789e1', &
'123e1d2', &
'@']
block
character(len=256) :: message
integer :: ios
real(kind=wp) val2
do i=1,size(tests)
val=-99999999.0d0
val2=-99999999.0d0
ret=ator(tests(i),val)
read(tests(i),fmt=*,iostat=ios,iomsg=message)val2
if(ios.ne.0)then
write(*,'(*(g0,1x))')'status:',ret,' value:',val,' string:',trim(tests(i)),'fromread:',val2,' message:',trim(message)
else
write(*,'(*(g0,1x))')'status:',ret,' value:',val,' string:',trim(tests(i)),' fromread:',val2
endif
enddo
end block
end program main
with default compiler options:
+ gfortran x3..f90
+ ./a.out
time * : 1.2085 seconds
time * : 0.2182 seconds
+ ./a.out
time * : 1.1580 seconds
time * : 0.2247 seconds
+ ifort x3..f90
+ ./a.out
time * : 0.6899 seconds
time * : 0.0866 seconds
+ ./a.out
time * : 0.6712 seconds
time * : 0.0895 seconds
+ nvfortran x3..f90
+ ./a.out
time * : 0.2984 seconds
time * : 0.2143 seconds
+ ./a.out
time * : 0.3054 seconds
time * : 0.2132 seconds
LOTS of interesting experiences. I had the least success with nvfortran(1) but note how much faster the internal read in nvfortran(1) is; some bizarre surprises like one compiler slowing down by an order of magnitude when I used CHARACTER variables in the SELECT, that when I used array syntax or TRANSFER to convert from CHARACTER to INTEGER some sped up and others slowed down tremendously, … some real surprises between the compilers.
Fun and frustrating at the same time. I set a self-imposed goal of 1 epsilon from the original value and keeping it simple by not using bit-fiddling and no compiler-specific tuning. Might be too expensive a goal in retrospect.
Would love to see runs from other programming environments, maybe throwing in gfortran to compensate as a baseline to compensate for hardware differences. I am running on a relatively light-weight laptop.
The issue is of course the accuracy and all the edge cases one has to keep in mind. If one were willing to tolerate around 10 epsilon and not want to indulge in bit-processing and not support everything C stdlib strtod
does, then some simple options are amenable with a decent speed improvement over C strtod
. Shown below is one such example setup as a subroutine
subprogram in order to allow easier procedure overloading for single precision and possibly extended precision (80-bit) floating-point values.
module F_strtod_m
! Simple Fortran string to double
use, intrinsic :: iso_fortran_env, only : I8 => int64
integer, parameter :: DP = selected_real_kind( p=12 )
! FP constants
real(DP), parameter :: TEN = 10.0_dp
! Integer constants
integer, parameter :: ASCII_NEGATIVE = 45
integer, parameter :: ASCII_PERIOD = 46
integer, parameter :: ASCII_0 = 48
integer, parameter :: ASCII_9 = 57
integer(I8), parameter :: TEN_INT = 10_i8
contains
elemental subroutine F_strtod( str, r )
character(len=*), intent(in) :: str
real(DP), intent(out) :: r
! Local variables
integer(I8) :: n, n_exp
integer :: lens, pos_exp, expnt
r = 0.0_dp
n_exp = 0
lens = len_trim( str )
pos_exp = index( str, "E", back=.true. )
if ( pos_exp == 0 ) then
pos_exp = index( str, "e", back=.true. )
end if
if ( pos_exp > 0 ) then
call str2dec( str(pos_exp+1:lens), n_exp )
else
pos_exp = lens + 1
end if
call str2dec( str(1:pos_exp-1), n, expnt )
r = real( n, kind=DP ) * (TEN**(n_exp + expnt))
return
end subroutine
elemental subroutine str2dec( s, n, expnt )
character(len=*), intent(in) :: s
integer(I8), intent(out) :: n
integer, intent(out), optional :: expnt
integer :: ipos, ic, pos_period, exponent
n = 0_i8
pos_period = 0
exponent = -1
if ( present(expnt) ) exponent = 0
do ipos = len(s), 1, -1
ic = ichar( s(ipos:ipos) )
if ( present(expnt) ) then
if ( ic == ASCII_PERIOD ) then
pos_period = ipos
cycle
end if
end if
if ( ic == ASCII_NEGATIVE ) then
n = -n
exit
end if
if ( (ic < ASCII_0).or.(ic > ASCII_9) ) exit
exponent = exponent + 1
n = n + TEN_INT**(exponent)*(ic - 48)
end do
if ( present(expnt) ) expnt = -len(s) + pos_period - 1
return
end subroutine
end module
Block 1: formatted read toward string to double
time * : 0.7490 secondsBlock 2: C strtod
time * : 0.2980 secondsBlock 3: Fortran strtod
time * : 0.1410 seconds
Values match within 10 epsilon.
Also as defined assignment if one were so inclined.
module str2r_m
! Simple Fortran string to real
use, intrinsic :: iso_fortran_env, only : I8 => int64, real_kinds
! Integer constants
integer, parameter :: ASCII_NEGATIVE = 45
integer, parameter :: ASCII_PERIOD = 46
integer, parameter :: ASCII_0 = 48
integer, parameter :: ASCII_9 = 57
integer(I8), parameter :: TEN_INT = 10_i8
interface assignment(=)
module procedure str2r_rk1
module procedure str2r_rk2
module procedure str2r_rk3
end interface
contains
elemental subroutine str2r_rk1( r, str )
character(len=*), intent(in) :: str
real(real_kinds(1)), intent(out) :: r
! Local variables
integer, parameter :: WP = real_kinds(1)
include 'str2r.i90'
end subroutine
elemental subroutine str2r_rk2( r, str )
character(len=*), intent(in) :: str
real(real_kinds(2)), intent(out) :: r
! Local variables
integer, parameter :: WP = real_kinds(2)
include 'str2r.i90'
end subroutine
elemental subroutine str2r_rk3( r, str )
character(len=*), intent(in) :: str
real(real_kinds(3)), intent(out) :: r
! Local variables
integer, parameter :: WP = real_kinds(3)
include 'str2r.i90'
end subroutine
elemental subroutine str2dec( s, n, expnt )
character(len=*), intent(in) :: s
integer(I8), intent(out) :: n
integer, intent(out), optional :: expnt
integer :: ipos, ic, pos_period, exponent
n = 0_i8
pos_period = 0
exponent = -1
if ( present(expnt) ) exponent = 0
do ipos = len(s), 1, -1
ic = ichar( s(ipos:ipos) )
if ( present(expnt) ) then
if ( ic == ASCII_PERIOD ) then
pos_period = ipos
cycle
end if
end if
if ( ic == ASCII_NEGATIVE ) then
n = -n
exit
end if
if ( (ic < ASCII_0).or.(ic > ASCII_9) ) exit
exponent = exponent + 1
n = n + TEN_INT**(exponent)*(ic - 48)
end do
if ( present(expnt) ) expnt = -len(s) + pos_period - 1
return
end subroutine
end module
! include str2r.i90
! FP constants
real(WP), parameter :: TEN = 10.0_wp
integer(I8) :: n, n_exp
integer :: lens, pos_exp, expnt
r = 0.0_wp
n_exp = 0
lens = len_trim( str )
pos_exp = index( str, "E", back=.true. )
if ( pos_exp == 0 ) then
pos_exp = index( str, "e", back=.true. )
end if
if ( pos_exp > 0 ) then
call str2dec( str(pos_exp+1:lens), n_exp )
else
pos_exp = lens + 1
end if
call str2dec( str(1:pos_exp-1), n, expnt )
r = real( n, kind=WP ) * (TEN**(n_exp + expnt))
return
Update on my vectorised str2real:
I replaced all e
of the factors of the expo
variable with d
(e.g. 1e15
→ 1d15
). This fixed the precision problem.
Note that on my platform the the strtod version takes 0.25 seconds so it is faster with all three compilers that the strtod version; and with -Ofast or -fast the Fortran version is twice as fast with two of the compilers, passes the one-epsilon criteria on a round trip of the data and catches all the edge cases I tried so far that strtod does except strtod can handle hexadecimal values and the NAN and Inf values, at least on some platforms.
But the vectorized version appears to be so fast I think I could call it and just use mine for parsing out the edge cases and it appears it would still and that would speed it up more. That is surprising.
Note that technically an equivalence between variables of different TYPE is non-standard but a very common extension but seems to have worked well in the vectorized version; but in my version when I used the TRANFER function to do that in a more standard way it doubled my runtime on one of the compilers.
I think most of the strings should now be parsable. However, I commented out the part which enables the correct handling of strings with a small e
, because it somehow increased the runtime from 0.04 s to 0.06 which seems to be a little bit too much to me.
Code on Github:
Next steps:
equivalence
expo
parameter to coefficients
because that’s what it really isE
and e
are possibleNow the question is: What are reasonable lengths for mantissa and exponent?
@urbanjost I didn’t test the speed for other float formats, it might be slower for shorter strings. Maybe it would be better to call the vectorised version only if the string is long enough.
if these snippets were going to become code, I’d guess all ichar
should be changed to iachar
. Otherwise the code will not be portable.
I tried to rewrite the part with equivalence
but when I use transfer
the time of the whole function doubles.
-equivalence(equ_i, equ_s)
-equ_s = s
+equ_i = transfer(s, equ_i, len(s))
tested with gfortran 11.1.0
For the strtod
option suggested… it doesn’t seem to distinguish between 0 and an error, so I guess we have to check for that separately? Either try and determine if the string is a 0.0 before trying to parse it, or just use strtod
and if it comes back 0, then test it. Could do something like this:
rval = strtod( str//C_NULL_CHAR, endptr )
if (rval==0.0_RK) then
read(str,fmt=*,iostat=ierr) rval
else
ierr = 0
end if
Prior to Ver 7, GFortran, was very slow converting 8-byte reals to string,
such as “write (lu,fmt=’(f15.8)’) val”
To overcome this run time hit, this required writing routines to convert integer or real to string, for both I, F and E formats. For large dumps of analysis results to text file, the time reduction was to about 1/10 th.
I am not sure of the reason, perhaps limiting edge conditions.
This was overcome in Ver 7, but my patch code has been retained, as it also treated zero as " 0. ", which is a useful special case report.
The code was nothing special, just Fortran, but was effective.
subroutine write_val_Fr8 (val, str, n)
!
! writes number into str as format (Flen.n) eg -3.04
!
real*8 :: val ! value to write; must fit
integer*4 :: n ! digits >= 0 and < len(str)
character :: str*(*)
!
real*8 :: rv ! abs ( val)
integer*8 :: v ! integer for digits of val
integer*8 :: ten = 10 ! mod
integer*4 :: k ! position of digit
integer*4 :: p ! position of '.'
integer*4 :: sgn ! +/-
integer*4 :: d ! digit
! integer*4 :: z = ichar ('0')
character :: dig(0:9)*1=(/ '0','1','2','3','4','5','6','7','8','9' /)
!
k = len (str)
p = k-n
if ( p < 1 ) goto 99
!
str = ' '
if ( val > 0 ) then
sgn = 1
rv = val
else if ( val < 0 ) then
sgn = -1
rv = -val
else
!z sgn = 1
!z rv = val
str(p-1:p) = '0.'
return
end if
!
! Integer of digits
if (n > 0 ) then
v = ( rv * 10**n + 0.5 )
else
v = ( rv + 0.5 )
end if
! if ( v == 0) sgn = 0 ! remove -0.000
!
! generate digits
str(p:p) = '.'
do
if ( k==p ) k = k-1
d = mod(v,ten)
if ( k < 1 ) goto 99
str(k:k) = dig(d) ! char (d+z)
v = v/ten
k = k-1
if ( v == 0 .and. k < p ) exit
end do
!
! -ve values
if ( sgn < 0 ) then
if ( k < 1 ) goto 99
str(k:k) = '-'
end if
return
!
! overflow field
99 str = repeat ('#', len(str))
return
end subroutine write_val_Fr8
This is a reported problem during 2016 and 2017. My recollection is the performance problem was corrected in a version of gfortran Ver 7.x. I did provide reproducers at the time.
My testing was on Windows 7 OS, although I think the problem was on other OS.
Thomas Koenig and mecej4 provided valuable assistance with the tests.
Here is a test program. On Windows 10, with an Intel i7-10710U CPU, the program runs in 1.4 s with Intel Fortran OneAPI, and in 27 s with Gfortran 10.2 (Cygwin).
program IntlWrite
implicit none
character(120) :: str
integer :: i,j
double precision :: x(10) = (/ 7.569d-1, 5.556d-1, -1.640d-1, 9.362d-1, 1.057d-1, &
-2.385d-1, -9.541d-1, 1.449d-1, -7.885d-1, -1.108d-1 /)
Integer, parameter :: tkind = selected_int_kind(15)
Integer(tkind) :: tcnt1, tcnt2, trate
call system_clock(tcnt1,trate)
do j=1, 1000000
write (str,'(1P,10ES11.3)') x
if (mod(j,200000).eq.0) write (*,'(1x,I7,2x,A)') j, str
end do
call system_clock(tcnt2)
print '(1x,A,2x,F6.3,A)','Time used (System_Clock) : ', &
real(tcnt2-tcnt1)/trate,' sec'
end program
See also the GCC-libFortran bug report that JohnCampbell alluded to. That report was filed in November 2016, and the issue was closed in June 2018.
For me on Windows 10, the times are 1.30 s for Intel Fortran and 7.76 s for GNU Fortran (GCC) 12.0.0 20211024 (not Cygwin) from equation.com , so it appears that gfortran 12 is faster than gfortran 10.2 on this, although still slower than Intel.
Out of curiosity (and seeing a procrastination opportunity) I wrote a simple parser myself. It is roughly 15 times faster than via a plain read statement. Of course, accuracy and such will further complicate the code, but it seems to me something like this is a useful addition to stdlib. I had not realised it could be so much faster.