Faster string to double

Opened a PR on stdlib here Proposal for a reference string to number conversion facility in stdlib by jalvesz · Pull Request #743 · fortran-lang/stdlib · GitHub

Implementation: https://github.com/jalvesz/stdlib/blob/master/src/stdlib_str2num.f90
test: https://github.com/jalvesz/stdlib/blob/master/test/string/test_string_to_number.f90
example: https://github.com/jalvesz/stdlib/blob/master/example/strings/example_str2num.f90

Used the occasion to duplicate for str2float and str2double

The complete interface can be called from the subroutine str2num which covers all available types:

    !> easy to use function interfaces
    public :: str2int,    str2int_p
    public :: str2float,  str2float_p
    public :: str2double, str2double_p
    !> generic subroutine interface
    public :: str2num

Any suggestions/contributions?

3 Likes

I would also suggest a real128 version also.

If only we could make this look like the intrinsic functions, int and real.

3 Likes

I did try but the problem is that the compilers can not disambiguate the l.h.s of the assignment for the function interface, that’s why I ended up with a subroutine interface and the different functions. If anyone has an idea how to manage it would indeed be great.

One approach might be to add a MOLD= argument. That is the way that, for example, TRANSFER() and other intrinsic functions work.

One might want this function to work with any supported real kind. This gets back to the difficulty of writing generic functions in a portable way. Adding a MOLD= argument takes care of the user interface part of the problem, but there is still the usual problem of writing code that works with every kind in the REAL_KINDS(:) array.

@RonShepard @everythingfunctional something like this Compiler Explorer ?

...
public :: str2num, str2num_p
...
interface str2num
        module procedure str2int
        module procedure str2float
        module procedure str2double
end interface

interface str2num_p
        module procedure str2int_p
        module procedure str2float_p
        module procedure str2double_p
end interface
...
elemental function str2int(s,mold) result(v)
        ! -- In/out Variables
        character(*), intent(in) :: s !> input string
        integer, intent(in) :: mold !> dummy argument to disambiguate at compile time the generic interface
        integer :: v !> output value
...
function str2double_p(s,mold,stat) result(r)
        ! -- In/out Variables
        character(len=:), pointer :: s !> input string
        real(dp), intent(in) :: mold !> dummy argument to disambiguate at compile time the generic interface
        real(dp) :: r    !> Output real value
        integer(1),intent(inout), optional :: stat
! etc

No, the way you did it is the way to do it with current Fortran. I mostly made the comment to introduce the potential proposal for the standard.

Ok, I see, I guess that for a proposal for the standard, the first step would be that the compilers handle the l.h.s of an assignment as a kind of 0th argument, such that it can then unambiguously chose which function to use when creating generic interfaces.

Any comments on the current proposal for stdlib in the mean time? Different names for str2_kind_ or str2num with an extra arg to disambiguate ?

1 Like

Ouch, just found out that ifort/ifx do not support use iso_c_binding, only: c_int128_t … cannot recycle the same algorithm for real128 :expressionless:

Maybe it might be more consistent to call it to_num since we have already the inverse called to_string and a number of procedures called to_*. The host module would disambiguate that the source of the operation is a string.

Thanks for the PR!

3 Likes

:+1: perfect

I was referring to the mold argument as proposed by @RonShepard. To retain a function-like interface, as of now we are obliged to choose between:

real(4) :: r
real(8) :: d
character(*) :: s
r = to_float( s )
d = to_double( s )

Or

real(4) :: r
real(8) :: d
character(*) :: s
r = to_num( s , mold=r )
d = to_num( s , mold=d )

Any strong feelings against/for one or the other ? (Both work, and from the preliminary tests I saw no significant penalty)

How do you write a to_float() kind of function for every kind that is in your compiler’s real_kinds(:) array in a portable way? And what is float on one compiler might well be double on another compiler (or the same compiler with different options), so writing portable code that uses these functions is also problematic.

The to_num(s,mold=x) interface can handle that complication in principle, but in practice fortran still makes it difficult to write portable code to handle all the real_kinds(:) in a simple and portable way. Is this elemental, so that it can do both scalar and array conversions? That’s another complication with the mold argument. Maybe select_rank can help?

I think I favor the second approach, but I acknowledge that there are still going to be difficulties with the implementation.

Yes! So no need for select_rank. The remaining issue is the kinds.

Wouldn’t it be possible to use interface assignment(=)?

Real support varies by compiler. For the ones I’ve got installed

Nag - real16, real32, real 64, real 128
Gfortran - real 32, real 64, real80, real 128
Intel - real32, real64, real128
nvidia - real32, real64

I also use a Cray, bit I can’t remember off hand
whether they support real128.

which 16 does nag support? bfloat16 or float16?

Here is the output from running one of our examples, ch0510.f90

            =====================
            Real kind information
            =====================
  kind number
      16   1   2   3
  digits details
      11   24   53   106
  epsilon details
        9.7656E-04
        1.1920929E-07
        2.2204460492503131E-16
      2.46519032881566189191165177E-32
  huge value
       65504.
        3.4028235E+38
       1.7976931348623157E+308
      8.98846567431157953864652595E+307
  maxexponent value
      16
      128
      1024
      1023
  minexponent value
      -13
      -125
      -1021
      -968
  precision details
      3   6   15   31
  radix details
      2   2   2   2
  range details
      4   37   307   291
  tiny details
        6.1035E-05
        1.1754944E-38
       2.2250738585072014E-308
      2.00416836000897277799610805E-292

Here is the source file.

program ch0510
  implicit none
!
! real arithmetic
! 
! 16 bit reals are in the latest IEEE standard.
! we have added tests for that type in this
! program.
!
! 32 and 64 bit reals are normally available.
! The IEEE format is as described below.
!
! 32 bit reals  8 bit exponent, 24 bit mantissa
! 64 bit reals 11 bit exponent, 53 bit mantissa
!
! 128 bit reals and decimal are also in the
! latest IEEE standard.
! We have chosen a portable specification
! for 128 bit reals as Nag use their own. 
!

!  integer, parameter :: hp = 16
  integer, parameter :: hp = selected_real_kind( 3,   4)
  integer, parameter :: sp = selected_real_kind( 6,  37)
  integer, parameter :: dp = selected_real_kind(15, 307)
  integer, parameter :: qp = selected_real_kind(30, 291)

  real (hp) :: rhp
  real (sp) :: rsp
  real (dp) :: rdp
  real (qp) :: rqp

  print *, '           ====================='
  print *, '           Real kind information'
  print *, '           ====================='
  print *, ' kind number'
  print *, '    ', kind(rhp), ' ', kind(rsp), ' ', kind(rdp), ' ', kind(rqp)
  print *, ' digits details'
  print *, '    ', digits(rhp), ' ', digits(rsp), ' ', digits(rdp), ' ', digits(rqp)
  print *, ' epsilon details'
  print *, '    ', epsilon(rhp)
  print *, '    ', epsilon(rsp)
  print *, '    ', epsilon(rdp)
  print *, '    ', epsilon(rqp)
  print *, ' huge value'
  print *, '    ', huge(rhp)
  print *, '    ', huge(rsp)
  print *, '    ', huge(rdp)
  print *, '    ', huge(rqp)
  print *, ' maxexponent value'
  print *, '    ', maxexponent(rhp)
  print *, '    ', maxexponent(rsp)
  print *, '    ', maxexponent(rdp)
  print *, '    ', maxexponent(rqp)
  print *, ' minexponent value'
  print *, '    ', minexponent(rhp)
  print *, '    ', minexponent(rsp)
  print *, '    ', minexponent(rdp)
  print *, '    ', minexponent(rqp)
  print *, ' precision details'
  print *, '    ', precision(rhp), ' ', precision(rsp), ' ', precision(rdp), ' ', precision(rqp)
  print *, ' radix details'
  print *, '    ', radix(rhp), ' ', radix(rsp), ' ', radix(rdp), ' ', radix(rqp)
  print *, ' range details'
  print *, '    ', range(rhp), ' ', range(rsp), ' ', range(rdp), ' ', range(rqp)
  print *, ' tiny details'
  print *, '    ', tiny(rhp)
  print *, '    ', tiny(rsp)
  print *, '    ', tiny(rdp)
  print *, '    ', tiny(rqp)
end program
1 Like

Here are some more details for other compilers we’ve used.

Salford/Silverfrost

real32, real64

Sun/Oracle

real32, real64, real128

If this were allowed, then I think it would limit the use of the function to just simple assignents, x=to_num(s). It would not allow the use in expressions. Consider something like x=to_num(s1)*to_num(s2). How would the compiler know which specific functions to invoke in the two references? If a function is only used in simple assignments, then it might as well be a subroutine. Functions are useful because they can be used in expressions.

Another kind of ambiguity that would need to be addressed is whether to_num(s) invokes the integer specific, whose value is then converted implicitly to the real value for assignment, or whether it invokes the real specific. With the MOLD= argument, the programmer can control the semantics in both assignments and in expressions.

I think the convenience of a function interface for these conversions is one of the appealing features. Otherwise, why not simply use the existing internal read approach? For the conversion of a single string, is there any practical reason why that should be slower than a user-written subroutine or function?