I’m not sure what the consensus is on this issue, but I would point out that it is consistent with E format output where the exponent width is too small for the letter, sign, and number. In that case the letter is dropped, and the sign and exponent value are written, so no information is lost. If the exponent field is still too small, then the entire field is filled with *, losing the information. If it is possible to convert those numbers, it would be symmetrical with the previous write that produced them.
For the issue of storing intermediate values in integers, that always means that those values are limited by the integer range. If the string of digits is converted and stored as floating point numbers, then the only digits that are lost are those that would be lost anyway. Consider storing, say, a 30-digit value into eventually a 32-bit real. There is likely no integer kind large enough to store that 30-digit value. But if it is converted directly into the 32-bit real, then there would be no problem with integer overflow. Of course, each step of that conversion would require a floating point multiply and add, so it would be slower than the corresponding integer conversion.
For this, the strategy I found to work quite nicely was to read the fractional part of the number with an integer(wp) that should enable the precision required for the end real. Currently only implemented for real(8), so let’s say we want to read the following value:
s = '0.123456789123456789123456789123456789'
a formatted read will throw
real(8) :: x
...
read(s,*) x
write(*,*) x !< 0.12345678912345678 the value is truncated at the 17th decimal place
by iterating over the string (and making sure to finish the loop at the limit of the integer(8) :: int_wp) I would get first and int=0 and int_wp = 1234567891234567891 and then r = int + int_wp * 1d-19 = 0.12345678912345678.
The idea was to use the fractional base multiplication and conversion just once at the end.
To enable for larger real types indeed we should extend the fractional base for this idea to work, but I guess it should in principle ( ? )
If I read correctly (Agner’s tables), in modern processors, FMA is faster than int -> double conversion. The former has latency of 4 or 5 (reciprocal throughput of 0.5), whereas the latter has 6 (reciprocal of 1 or 2). That’s for scalar conversion (not packed/SIMD).
Interesting.
Had to double check. Using gfortran 12.2.0 that string gets parsed to 1.0e-3. No warnings.
You can use one integer only. For "123.456e7" you might have int = 123456 and a counter for the digits after the dot. The parsed number then is 123456 * 10**(7 - k), where k = 3 in this case. No idea which way is faster/more accurate, haven’t tested myself.
This program shows what list-directed i/o does with various strings.
program efmt
real :: x
character(:), allocatable :: str
str = '1.e-3' ! decimal, with e
read(str,*) x
write(*,*) x
str = '1.-3' ! decimal, no e
read(str,*) x
write(*,*) x
str = '1-3' ! no decimal, no e
read(str,*) x
write(*,*) x
str = '1234567890123456789012345678901234567890-9' ! long string, no decimal, no e
read(str,*) x
write(*,*) x
end program efmt
With NAG and gfortran, the output is something like:
In light of the evidences I’ll just be glad that inadvertelly got the same output
I took your advice and rearranged the core code, it is compact and cleaner, but didn’t gain any performance (except for some flukes in which running times would drop to 0.02 or 0.016, but mostly around 0.032)
Adding a check on the maximum admissible lenght of the integer(wp) for handling long strings did drop significantly the perfs (would increase to [0.044;0.047]s). Have to rethink the loops…
I gave some thoughts to the last discussions regarding long strings and tried a small change to the algorithm (you can check out the full version in the github):
elemental subroutine str2real_unroll(s,r,p,stat)
...
! read whole and fractional number in a single integer
pP = p
int_wp = 0
do i = p, min(19+p-1,len(s)) !< stop at the max precision available for the integer(8)
val = iachar(s(i:i))-digit_0
if( val >= 0 .and. val <= 9 ) then
int_wp = int_wp*10 + val
else if( val == period ) then
pP = i
else
exit
end if
end do
p = i
With this modification, the following test will pass:
you go into an if that expects part of the exponential section, either e, E, + or -. Which in the case of
"1234567890123456789012345678901234567890-9"
does not happen. You may need to keep reading characters (digits) and discarding them if you have read enough for the set precision, until you find e, E, + or - (deal with exponent) or the end of the string (be done).
I was reading this thread this morning before the last post and thinking that it would be nice to have this in the stdlib since we have to_string for the reverse. Encouraging to see development continue. Not sure about the difference in the algos posted by @Carltoffel and @hkvzjal
You can checkout both implementations in the file str2num_m.f90. When I picked up the project I started from @Carltoffel’s implementation then moved to the “rollout” one as I found out it was not only faster but I managed to streamline several numbers in a single string.
Used the occasion to duplicate for str2float and str2double
The complete interface can be called from the subroutine str2num which covers all available types:
!> easy to use function interfaces
public :: str2int, str2int_p
public :: str2float, str2float_p
public :: str2double, str2double_p
!> generic subroutine interface
public :: str2num
I did try but the problem is that the compilers can not disambiguate the l.h.s of the assignment for the function interface, that’s why I ended up with a subroutine interface and the different functions. If anyone has an idea how to manage it would indeed be great.
One might want this function to work with any supported real kind. This gets back to the difficulty of writing generic functions in a portable way. Adding a MOLD= argument takes care of the user interface part of the problem, but there is still the usual problem of writing code that works with every kind in the REAL_KINDS(:) array.
Ok, I see, I guess that for a proposal for the standard, the first step would be that the compilers handle the l.h.s of an assignment as a kind of 0th argument, such that it can then unambiguously chose which function to use when creating generic interfaces.
Any comments on the current proposal for stdlib in the mean time? Different names for str2_kind_ or str2num with an extra arg to disambiguate ?