Well, I did not get very far myself with vector ops. I was trying to use the “Zen of Fortran” with some lines I added and one of the interesting issues to me was that using three different compilers a change I would make would often speed up one and sometimes slow another to well below the benchmark speed of the internal READ. Here is one descended from the original post I made that speeds up gfortran, ifort, and nvfortran to varying degrees and has only big issue in that something like “1111111111111111111111111111111111111111111111111111111111111111111111111111111111” is handled incorrectly. Reporting errors on input cost quite a bit but I still saw significant speed-ups. For reference (Warts and all for now; but I am going to try to merge in some of your features if I get the time):
module m_ascii
private
public ator
contains
logical function ator(str,val)
use iso_fortran_env, only: wp => real64, ip => int64
implicit none
! Convert ASCII-text to DP and return .TRUE. if OK
character(len=*),intent(in) :: str
real(kind=wp) val
integer(kind=ip) :: value(3), sval(3),digits(3)
integer(kind=ip) :: count(5)
integer :: i, part, a, ipos
integer,parameter :: uech=ichar('E'), lech=ichar('e'), udee=ichar('D'), ldee=ichar('d')
integer,parameter :: plsign=ichar('+'), mnsign=ichar('-'), decim=ichar('.'), aspace=ichar(' ')
integer,parameter :: u0=ichar('0'), u1=ichar('1'), u2=ichar('2'), u3=ichar('3'), u4=ichar('4')
integer,parameter :: u5=ichar('5'), u6=ichar('6'), u7=ichar('7'), u8=ichar('8'), u9=ichar('9')
value=0
count=0
digits=0
ipos=0
ator = .false.
sval = [1,0,1]
part = 1
!not using integers slow in ifort
!do i=1,len_trim(str)
! a(i)=ichar(str(i:i))
!enddo
!a=transfer(str,1,size(a)) ! fast with gfortran, but not ifort or nvfortran
!a=ichar([(str(i:i),i=1,size(a))]) ! incredibly slow with ifort
do i = 1, len(str)
a=ichar(str(i:i))
ipos=ipos+1
select case(a)
case(u0:u9) ! if too many digits switch to real, ignore, or error
value(part) = value(part)*10 + a-u0
digits(part) = digits(part) + 1
case(decim) ! if more than once should report error
part = 2
count(1)=count(1)+1
case(uech,lech,udee,ldee) ! if more than once should report error
part = 3
count(2)=count(2)+1
ipos=0
case(mnsign) ! sign in non-standard position or duplicated should report error
sval(part) = sval(part)*(-1)
if(ipos.ne.1)count(3)=count(3)+len(str)+2
count(3)=count(3)+1
case(plsign)
sval(part) = sval(part)*1
if(ipos.ne.1)count(4)=count(4)+1
count(4)=count(4)+1
case(aspace) ! should possibly not ignore all internal spaces
ipos=ipos-1
case default
value(part) = 0
count(5)=99999
!return
end select
enddo
! is no value after E an error?
associate ( whole=>value(1), fractional=>value(2), exp=>value(3), sgn=>sval(1), sexp=>sval(3) )
val = sign(real(whole,kind=wp) + real(fractional,kind=wp)/10**digits(2),real(sgn,kind=wp))* (10.0_wp**(exp*sexp))
!!write(*,'(*(g0,1x))')'value=',value,' sval=',sval,' digits=',digits,' string=',str,val
end associate
if(all(count.le.1).and.ipos.ne.0)then
ator = .true.
else
ator = .false.
endif
contains
subroutine digit()
end subroutine digit
end function ator
end module m_ascii
program main
use M_ascii, only : ator
use iso_fortran_env, only: wp => real64, ip => int64
implicit none
integer,parameter :: n = 1000000 !! number of values
real(wp) :: rval_ator(n), rval_read(n), rval(n)
real(wp) :: val
integer :: ierr, i
integer(kind=ip) :: start, finish, count_rate
logical :: lerr, ret
character(len=30),allocatable :: strings(:)
character(len=:),allocatable :: tests(:)
! create a list of values to parse
allocate(strings(n))
do i = 1, n
call random_number(rval(i))
write(strings(i), '(g0)') rval(i)
enddo
! use internal read
call system_clock(start, count_rate)
do i = 1, n
read(strings(i),fmt=*,iostat=ierr) rval_read(i)
enddo
call system_clock(finish)
write(*,'(a30,1x,f7.4,1x,a)') 'time * : ', (finish-start)/real(count_rate,wp), ' seconds'
! use ator()
call system_clock(start)
do i = 1, n
lerr=ator(strings(i),rval_ator(i))
enddo
call system_clock(finish)
write(*,'(a30,1x,f7.4,1x,a)') 'time * : ', (finish-start)/real(count_rate,wp), ' seconds'
! compare results
do i = 1, n
if(abs(rval_ator(i)-rval_read(i)).gt.1*epsilon(rval_ator(i))) then
write(*,'(i10,1x,*(g0,1x))')i,rval_ator(i)-rval_read(i),rval_ator(i),rval_read(i),strings(i)
endif
enddo
!do i = 1, n
! if(rval(i)-rval_read(i).ne.0.0_wp) then
! write(*,'(i10,1x,*(g0,1x))')i,rval_ator(i)-rval_read(i),rval_ator(i),rval_read(i),strings(i)
! endif
!enddo
!do i = 1, n
! if(rval(i)-rval_ator(i).ne.0.0_wp) then
! write(*,'(i10,1x,*(g0,1x))')i,rval_ator(i)-rval_read(i),rval_ator(i),rval_read(i),strings(i)
! endif
!enddo
tests=[character(len=80) :: &
! returns 0
'', &
! returns 0
'E', &
! does not show overflow error
'1111111111111111111111111111111111111111111111111111111111111111111111111111111111111', &
'1', &
'-1', &
'-1e3', &
'+1234567890.123456789e1', &
'+1234567890.123456789e100', &
'1234567890.123456789e1', &
'1234567890.123456789e-1', &
'123456-7890.123456789e1', &
'--1234567890.123456789e1', &
'+-1234567890.123456789e1', &
'123e1d2', &
'@']
block
character(len=256) :: message
integer :: ios
real(kind=wp) val2
do i=1,size(tests)
val=-99999999.0d0
val2=-99999999.0d0
ret=ator(tests(i),val)
read(tests(i),fmt=*,iostat=ios,iomsg=message)val2
if(ios.ne.0)then
write(*,'(*(g0,1x))')'status:',ret,' value:',val,' string:',trim(tests(i)),'fromread:',val2,' message:',trim(message)
else
write(*,'(*(g0,1x))')'status:',ret,' value:',val,' string:',trim(tests(i)),' fromread:',val2
endif
enddo
end block
end program main
with default compiler options:
+ gfortran x3..f90
+ ./a.out
time * : 1.2085 seconds
time * : 0.2182 seconds
+ ./a.out
time * : 1.1580 seconds
time * : 0.2247 seconds
+ ifort x3..f90
+ ./a.out
time * : 0.6899 seconds
time * : 0.0866 seconds
+ ./a.out
time * : 0.6712 seconds
time * : 0.0895 seconds
+ nvfortran x3..f90
+ ./a.out
time * : 0.2984 seconds
time * : 0.2143 seconds
+ ./a.out
time * : 0.3054 seconds
time * : 0.2132 seconds
LOTS of interesting experiences. I had the least success with nvfortran(1) but note how much faster the internal read in nvfortran(1) is; some bizarre surprises like one compiler slowing down by an order of magnitude when I used CHARACTER variables in the SELECT, that when I used array syntax or TRANSFER to convert from CHARACTER to INTEGER some sped up and others slowed down tremendously, … some real surprises between the compilers.
Fun and frustrating at the same time. I set a self-imposed goal of 1 epsilon from the original value and keeping it simple by not using bit-fiddling and no compiler-specific tuning. Might be too expensive a goal in retrospect.
Would love to see runs from other programming environments, maybe throwing in gfortran to compensate as a baseline to compensate for hardware differences. I am running on a relatively light-weight laptop.