Faster string to double

hkvzjal · November 1, 2023, 10:40am

Opened a PR on stdlib here Proposal for a reference string to number conversion facility in stdlib by jalvesz · Pull Request #743 · fortran-lang/stdlib · GitHub

Implementation: https://github.com/jalvesz/stdlib/blob/master/src/stdlib_str2num.f90
test: https://github.com/jalvesz/stdlib/blob/master/test/string/test_string_to_number.f90
example: https://github.com/jalvesz/stdlib/blob/master/example/strings/example_str2num.f90

Used the occasion to duplicate for str2float and str2double

The complete interface can be called from the subroutine str2num which covers all available types:

    !> easy to use function interfaces
    public :: str2int,    str2int_p
    public :: str2float,  str2float_p
    public :: str2double, str2double_p
    !> generic subroutine interface
    public :: str2num

Any suggestions/contributions?

jacobwilliams · November 1, 2023, 12:39pm

I would also suggest a real128 version also.

everythingfunctional · November 1, 2023, 2:05pm

If only we could make this look like the intrinsic functions, int and real.

github.com/j3-fortran/fortran_proposals

Constant Expression Value Arguments for Generic Resolution

opened 02:02PM - 01 Nov 23 UTC

everythingfunctional

It has so far been impossible to write user defined functions that can mimic the… behavior of various intrinsic functions. I.e. ``` function real(A, KIND) real, intent(in) :: A integer, intent(in) :: KIND real(KIND) :: real end function ``` But if we had some mechanism to define a constant expression value that must be one of the arguments we could do it. I.e. some suggestive syntax ``` interface real module function str_to_real128(A, KIND, STAT) character(len=*), intent(in) :: A integer, parameter, intent(in) :: KIND = real128 integer, intent(out), optional :: STAT real(real128) :: str_to_real128 end function ... end interface ... my_val = str("1.234", KIND=real128) ```

hkvzjal · November 1, 2023, 4:06pm

I did try but the problem is that the compilers can not disambiguate the l.h.s of the assignment for the function interface, that’s why I ended up with a subroutine interface and the different functions. If anyone has an idea how to manage it would indeed be great.

RonShepard · November 1, 2023, 5:33pm

One approach might be to add a MOLD= argument. That is the way that, for example, TRANSFER() and other intrinsic functions work.

RonShepard · November 1, 2023, 5:40pm

One might want this function to work with any supported real kind. This gets back to the difficulty of writing generic functions in a portable way. Adding a MOLD= argument takes care of the user interface part of the problem, but there is still the usual problem of writing code that works with every kind in the REAL_KINDS(:) array.

hkvzjal · November 1, 2023, 7:52pm

@RonShepard @everythingfunctional something like this Compiler Explorer ?

...
public :: str2num, str2num_p
...
interface str2num
        module procedure str2int
        module procedure str2float
        module procedure str2double
end interface

interface str2num_p
        module procedure str2int_p
        module procedure str2float_p
        module procedure str2double_p
end interface
...
elemental function str2int(s,mold) result(v)
        ! -- In/out Variables
        character(*), intent(in) :: s !> input string
        integer, intent(in) :: mold !> dummy argument to disambiguate at compile time the generic interface
        integer :: v !> output value
...
function str2double_p(s,mold,stat) result(r)
        ! -- In/out Variables
        character(len=:), pointer :: s !> input string
        real(dp), intent(in) :: mold !> dummy argument to disambiguate at compile time the generic interface
        real(dp) :: r    !> Output real value
        integer(1),intent(inout), optional :: stat
! etc

everythingfunctional · November 1, 2023, 8:09pm

No, the way you did it is the way to do it with current Fortran. I mostly made the comment to introduce the potential proposal for the standard.

hkvzjal · November 1, 2023, 8:13pm

Ok, I see, I guess that for a proposal for the standard, the first step would be that the compilers handle the l.h.s of an assignment as a kind of 0th argument, such that it can then unambiguously chose which function to use when creating generic interfaces.

Any comments on the current proposal for stdlib in the mean time? Different names for str2_kind_ or str2num with an extra arg to disambiguate ?

hkvzjal · November 1, 2023, 10:00pm

Ouch, just found out that ifort/ifx do not support use iso_c_binding, only: c_int128_t … cannot recycle the same algorithm for real128

epagone · November 1, 2023, 11:12pm

Maybe it might be more consistent to call it to_num since we have already the inverse called to_string and a number of procedures called to_*. The host module would disambiguate that the source of the operation is a string.

Thanks for the PR!

hkvzjal · November 2, 2023, 5:52am

perfect

I was referring to the mold argument as proposed by @RonShepard. To retain a function-like interface, as of now we are obliged to choose between:

real(4) :: r
real(8) :: d
character(*) :: s
r = to_float( s )
d = to_double( s )

Or

real(4) :: r
real(8) :: d
character(*) :: s
r = to_num( s , mold=r )
d = to_num( s , mold=d )

Any strong feelings against/for one or the other ? (Both work, and from the preliminary tests I saw no significant penalty)

RonShepard · November 2, 2023, 6:26am

How do you write a to_float() kind of function for every kind that is in your compiler’s real_kinds(:) array in a portable way? And what is float on one compiler might well be double on another compiler (or the same compiler with different options), so writing portable code that uses these functions is also problematic.

The to_num(s,mold=x) interface can handle that complication in principle, but in practice fortran still makes it difficult to write portable code to handle all the real_kinds(:) in a simple and portable way. Is this elemental, so that it can do both scalar and array conversions? That’s another complication with the mold argument. Maybe select_rank can help?

I think I favor the second approach, but I acknowledge that there are still going to be difficulties with the implementation.

hkvzjal · November 2, 2023, 6:29am

Yes! So no need for select_rank. The remaining issue is the kinds.

Carltoffel · November 2, 2023, 7:38am

Wouldn’t it be possible to use interface assignment(=)?

cmaapic · November 2, 2023, 12:06pm

Real support varies by compiler. For the ones I’ve got installed

Nag - real16, real32, real 64, real 128
Gfortran - real 32, real 64, real80, real 128
Intel - real32, real64, real128
nvidia - real32, real64

I also use a Cray, bit I can’t remember off hand
whether they support real128.

oscardssmith · November 2, 2023, 12:18pm

which 16 does nag support? bfloat16 or float16?

cmaapic · November 2, 2023, 12:43pm

Here is the output from running one of our examples, ch0510.f90

            =====================
            Real kind information
            =====================
  kind number
      16   1   2   3
  digits details
      11   24   53   106
  epsilon details
        9.7656E-04
        1.1920929E-07
        2.2204460492503131E-16
      2.46519032881566189191165177E-32
  huge value
       65504.
        3.4028235E+38
       1.7976931348623157E+308
      8.98846567431157953864652595E+307
  maxexponent value
      16
      128
      1024
      1023
  minexponent value
      -13
      -125
      -1021
      -968
  precision details
      3   6   15   31
  radix details
      2   2   2   2
  range details
      4   37   307   291
  tiny details
        6.1035E-05
        1.1754944E-38
       2.2250738585072014E-308
      2.00416836000897277799610805E-292

Here is the source file.

program ch0510
  implicit none
!
! real arithmetic
! 
! 16 bit reals are in the latest IEEE standard.
! we have added tests for that type in this
! program.
!
! 32 and 64 bit reals are normally available.
! The IEEE format is as described below.
!
! 32 bit reals  8 bit exponent, 24 bit mantissa
! 64 bit reals 11 bit exponent, 53 bit mantissa
!
! 128 bit reals and decimal are also in the
! latest IEEE standard.
! We have chosen a portable specification
! for 128 bit reals as Nag use their own. 
!

!  integer, parameter :: hp = 16
  integer, parameter :: hp = selected_real_kind( 3,   4)
  integer, parameter :: sp = selected_real_kind( 6,  37)
  integer, parameter :: dp = selected_real_kind(15, 307)
  integer, parameter :: qp = selected_real_kind(30, 291)

  real (hp) :: rhp
  real (sp) :: rsp
  real (dp) :: rdp
  real (qp) :: rqp

  print *, '           ====================='
  print *, '           Real kind information'
  print *, '           ====================='
  print *, ' kind number'
  print *, '    ', kind(rhp), ' ', kind(rsp), ' ', kind(rdp), ' ', kind(rqp)
  print *, ' digits details'
  print *, '    ', digits(rhp), ' ', digits(rsp), ' ', digits(rdp), ' ', digits(rqp)
  print *, ' epsilon details'
  print *, '    ', epsilon(rhp)
  print *, '    ', epsilon(rsp)
  print *, '    ', epsilon(rdp)
  print *, '    ', epsilon(rqp)
  print *, ' huge value'
  print *, '    ', huge(rhp)
  print *, '    ', huge(rsp)
  print *, '    ', huge(rdp)
  print *, '    ', huge(rqp)
  print *, ' maxexponent value'
  print *, '    ', maxexponent(rhp)
  print *, '    ', maxexponent(rsp)
  print *, '    ', maxexponent(rdp)
  print *, '    ', maxexponent(rqp)
  print *, ' minexponent value'
  print *, '    ', minexponent(rhp)
  print *, '    ', minexponent(rsp)
  print *, '    ', minexponent(rdp)
  print *, '    ', minexponent(rqp)
  print *, ' precision details'
  print *, '    ', precision(rhp), ' ', precision(rsp), ' ', precision(rdp), ' ', precision(rqp)
  print *, ' radix details'
  print *, '    ', radix(rhp), ' ', radix(rsp), ' ', radix(rdp), ' ', radix(rqp)
  print *, ' range details'
  print *, '    ', range(rhp), ' ', range(rsp), ' ', range(rdp), ' ', range(rqp)
  print *, ' tiny details'
  print *, '    ', tiny(rhp)
  print *, '    ', tiny(rsp)
  print *, '    ', tiny(rdp)
  print *, '    ', tiny(rqp)
end program

cmaapic · November 2, 2023, 12:45pm

Here are some more details for other compilers we’ve used.

Salford/Silverfrost

real32, real64

Sun/Oracle

real32, real64, real128

RonShepard · November 2, 2023, 5:11pm

If this were allowed, then I think it would limit the use of the function to just simple assignents, x=to_num(s). It would not allow the use in expressions. Consider something like x=to_num(s1)*to_num(s2). How would the compiler know which specific functions to invoke in the two references? If a function is only used in simple assignments, then it might as well be a subroutine. Functions are useful because they can be used in expressions.

Another kind of ambiguity that would need to be addressed is whether to_num(s) invokes the integer specific, whose value is then converted implicitly to the real value for assignment, or whether it invokes the real specific. With the MOLD= argument, the programmer can control the semantics in both assignments and in expressions.

I think the convenience of a function interface for these conversions is one of the appealing features. Otherwise, why not simply use the existing internal read approach? For the conversion of a single string, is there any practical reason why that should be slower than a user-written subroutine or function?

Topic		Replies	Views
Speed of atoi and atof vs. internal read	14	1653	May 6, 2022
Optimizing conversion of integer arrays into integers Help	7	846	December 5, 2020
String manipulation topic	4	278	October 17, 2024
Fortran only half as fast as C++ finding primes Humor	5	983	September 27, 2022
Fast_math: A collection of functions for fast number crunching using Fortran Announcements	12	1150	October 3, 2023

Faster string to double

Related topics