Speed of atoi and atof vs. internal read

I am now tweeting about C interoperability. It’s nice that one can call functions in the C standard library just by providing an interface. I may present an example of using the C atoi function to read an integer from a string instead of an internal read. I am finding atoi to be 3 to 15 times faster than a Fortran internal read of an integer, depending on the compiler, and atof to be 1.7 to 5.3 times faster than a Fortran internal read of a double.

program main
use, intrinsic :: iso_c_binding, only: c_char, c_null_char
implicit none
interface
   function atoi(in) bind(c)
   use, intrinsic    :: iso_c_binding
   integer(c_int)    :: atoi
   character(c_char) :: in(*)
   end function
end interface
integer :: j,iran
integer, parameter :: nread = 10**7, nt = 3
character(len=10,kind=c_char), allocatable :: digits(:), cdigits(:)
real :: times(nt),dt(nt-1),xran
integer, allocatable :: iran_c(:),iran_f(:)
allocate (digits(nread),cdigits(nread),iran_c(nread),iran_f(nread))
do j=1,nread
   call random_number(xran)
   iran = 10**6*(xran-0.5) ! random integer
   write (digits(j),"(i0)") iran ! write integer to string
   cdigits(j) = trim(digits(j)) // c_null_char ! create C string
end do
call cpu_time(times(1))
do j=1,nread ! read integers from C strings using C atoi
   iran_c(j) = atoi(cdigits(j))
end do
call cpu_time(times(2))
do j=1,nread ! read integers from strings using internal read
   read (digits(j),*) iran_f(j)
end do
call cpu_time(times(3))
dt = times(2:nt)-times(1:nt-1)
print "(4a8)","","C","Fortran","ratio"
print "(a8,3f8.4,/)","times",dt,dt(2)/dt(1)
print "(/,*(a8))","#","mean","min","max","first","last","maxdiff"
print "(i8,f8.1,*(i8))",nread,sum(dble(iran_c))/nread,minval(iran_c),maxval(iran_c), &
                     iran_c(1),iran_c(nread),maxval(abs(iran_c-iran_f))
end program main

using on WSL2 the script

#!/bin/bash
# compile and run a program composed of one or more source files with the .f90 extension, using several compilers
# usage: ./cmpl_all.sh mod1.f90 mod2.f90 main.f90

exec=a.out
declare compilers=("gfortran -O3" "ifort -O3" "nvfortran -O3" "flang -O3")
declare sources=("$@") 
declare objs=("${sources[@]/.f90/.o}")
print_version=false # print compiler version
 
for FC in "${compilers[@]}"; do # loop over compilers
  echo && echo $FC
  if [ "$print_version" = true ]; then
      $FC --version
  fi
  rm -f *.o $exec # cleanup
  for src in "${sources[@]}"; do
      $FC -c $src
  done
  $FC -o $exec "${objs[@]}"
  time ./$exec
done

I get results

gfortran -O3
               C Fortran   ratio
   times  0.2395  3.7085 15.4877


       #    mean     min     max   first    last maxdiff
10000000  -132.2 -499999  499999   75306 -450799       0

real	0m8.783s
user	0m8.737s
sys	0m0.040s

ifort -O3
               C Fortran   ratio
   times  0.2385  2.7406 11.4902


       #    mean     min     max   first    last maxdiff
10000000    44.4 -499999  499999 -499999   -6839       0

real	0m5.650s
user	0m5.614s
sys	0m0.020s

nvfortran -O3
               C Fortran   ratio
   times  0.2415  0.7409  3.0679


       #    mean     min     max   first    last maxdiff
10000000   -88.2 -499999  499999  407922 -293367       0

real	0m3.054s
user	0m2.964s
sys	0m0.080s

flang -O3
               C Fortran   ratio
   times  0.2422  0.9240  3.8145


       #    mean     min     max   first    last maxdiff
10000000   -88.2 -499999  499999  407923 -293367       0

real	0m3.350s
user	0m3.301s
sys	0m0.040s

and for a code comparing C atof with an internal read of a double

program main
use, intrinsic :: iso_c_binding, only: c_char, c_null_char, c_double
implicit none
interface
   function atof(in) bind(c)
   use, intrinsic    :: iso_c_binding
   real(c_double)     :: atof
   character(c_char) :: in(*)
   end function
end interface
integer :: j
integer, parameter :: nread = 10**7, nt = 3, dp = c_double
character(len=20,kind=c_char), allocatable :: digits(:), cdigits(:)
real(kind=dp) :: times(nt),dt(nt-1),xran
real(kind=dp), allocatable :: xran_c(:),xran_f(:)
allocate (digits(nread),cdigits(nread),xran_c(nread),xran_f(nread))
do j=1,nread
   call random_number(xran)
   xran = xran - 0.5
   write (digits(j),"(f0.6)") xran ! write real to string
   cdigits(j) = trim(digits(j)) // c_null_char ! create C string
end do
call cpu_time(times(1))
do j=1,nread ! read reals from C strings using C atof
   xran_c(j) = atof(cdigits(j))
end do
call cpu_time(times(2))
do j=1,nread ! read reals from strings using internal read
   read (digits(j),*) xran_f(j)
end do
call cpu_time(times(3))
dt = times(2:nt)-times(1:nt-1)
print "(4a8)","","C","Fortran","ratio"
print "(a8,3f8.4,/)","times",dt,dt(2)/dt(1)
print "(/,*(a8))","#","mean","min","max","first","last","maxdiff"
print "(i8,*(f8.4))",nread,sum(xran_c)/nread,minval(xran_c),maxval(xran_c), &
                     xran_c(1),xran_c(nread),maxval(abs(xran_c-xran_f))
end program main

I get

gfortran -O3
               C Fortran   ratio
   times  0.9669  5.0880  5.2620


       #    mean     min     max   first    last maxdiff
10000000  0.0001 -0.5000  0.5000 -0.4798 -0.1218  0.0000

real	0m13.807s
user	0m13.771s
sys	0m0.030s

ifort -O3
               C Fortran   ratio
   times  0.9708  3.0744  3.1670


       #    mean     min     max   first    last maxdiff
10000000  0.0000 -0.5000  0.5000 -0.5000 -0.0068  0.0000

real	0m9.775s
user	0m9.653s
sys	0m0.090s

nvfortran -O3
               C Fortran   ratio
   times  0.9870  1.6308  1.6523


       #    mean     min     max   first    last maxdiff
10000000 -0.0001 -0.5000  0.5000  0.4079 -0.2934  0.0000

real	0m5.243s
user	0m5.108s
sys	0m0.110s

flang -O3
               C Fortran   ratio
   times  0.9753  1.7878  1.8330


       #    mean     min     max   first    last maxdiff
10000000 -0.0001 -0.5000  0.5000  0.4079 -0.2934  0.0000

real	0m5.480s
user	0m5.359s
sys	0m0.110s

note that the C you’re comparing to is fairly slow. Fast float parsing in practice – Daniel Lemire's blog shows that much faster than the C stdlib is possible.

The 4th sentence of the blog: However, we sacrifice some standard compliance.

It’s easy to make something fast when one ignores the detail. Here’s a valid
Fortran code,

program foo
  character(len=10) :: str = '1.234-10'
  real x
  read(str,'(E8.3E3)') x
  print *, x

! This works with gfortran, too lazy to check Standard for correctness.
!  read(str,*) x 
! print *, x

end program foo

A fast parsing method needs to deal with the exponent.

That’s a good point, but abseil’s from_chars is still 3x faster than libc.

@Beliavsky ,

Have you checked in Fortran stdlib to see what may be available now?

You will know this thread for options relative to C strtod.

As to “string to int”, you can take inspiration from @lkedward in this thread and trivially get Fortran to give better performance.

C:\Temp>gfortran -O3 p.f90 -o p.exe

C:\Temp>p.exe
               C Fortran   ratio
   times  0.4375  0.1719  0.3929


       #    mean     min     max   first    last maxdiff
10000000    85.3 -500000  499999 -269279   18973       0
Click to see code
program main
use, intrinsic :: iso_c_binding, only: c_char, c_null_char
implicit none
interface
   function atoi(in) bind(c)
   use, intrinsic    :: iso_c_binding
   integer(c_int)    :: atoi
   character(c_char) :: in(*)
   end function
end interface
integer :: j,iran
integer, parameter :: nread = 10**7, nt = 3
character(len=10,kind=c_char), allocatable :: digits(:), cdigits(:)
real :: times(nt),dt(nt-1),xran
integer, allocatable :: iran_c(:),iran_f(:)
allocate (digits(nread),cdigits(nread),iran_c(nread),iran_f(nread))
do j=1,nread
   call random_number(xran)
   iran = 10**6*(xran-0.5) ! random integer
   write (digits(j),"(i0)") iran ! write integer to string
   cdigits(j) = trim(digits(j)) // c_null_char ! create C string
end do
call cpu_time(times(1))
do j=1,nread ! read integers from C strings using C atoi
   iran_c(j) = atoi(cdigits(j))
end do
call cpu_time(times(2))
iran_f = parse_int( digits )  !<-- simple elemental parsing
call cpu_time(times(3))
dt = times(2:nt)-times(1:nt-1)
print "(4a8)","","C","Fortran","ratio"
print "(a8,3f8.4,/)","times",dt,dt(2)/dt(1)
print "(/,*(a8))","#","mean","min","max","first","last","maxdiff"
print "(i8,f8.1,*(i8))",nread,sum(dble(iran_c))/nread,minval(iran_c),maxval(iran_c), &
                     iran_c(1),iran_c(nread),maxval(abs(iran_c-iran_f))
contains

   ! Simple variant for base 10 of function by @lkedward toward Julia microbenchmark
   elemental function parse_int(s) result(n)

      ! Argument list
      character(len=*), intent(in) :: s
      ! Function result
      integer :: n

      ! Local variables
      integer :: i, d, sign
      character :: c

      n = 0
      sign = 1
      loop_digits: do i = 1, len_trim(s)
         c = s(i:i)
         d = 0
         select case ( ichar(c) )
            case ( ichar('0'):ichar('9') )
               d = ichar(c) - ichar('0')
            case ( ichar('-') )
               sign = -1
               cycle loop_digits
            case ( ichar(' ') )
               cycle loop_digits
            case default 
               error stop "parse error"
         end select
         n = n*10 + d
      end do loop_digits
      n = sign*n

   end function

end program main
1 Like

That’s better than C, but still significantly worse than Parsers.jl or other optimized parsers.

@oscardssmith , can you please post here your test codes with timing comparisons with “Parsers.jl or other optimized parsers” on an apples-to-applies with the Fortran code I posted?

Honestly, looking at Parsers.jl with ints.jl here, I disbelieve that jl code will do better. With hex string to integer parsing, @lkedward showed how Fortran comes out ahead in the Julia microbenchmark:

So Parsers.jl doesn’t handle the case of hexidecimal parsing, but for decimal parsing,

using BenchmarkTools
function parseintbase(t)
    local n, m
    for i=1:t
        n = rand(UInt32)
        s = string(n, base = 10)
        m = UInt32(parse(Int64, s, base = 10))
        @assert m == n
    end
    return n
end

is 50% slower than

using BenchmarkTools
using Parsers
function parseintParsers(t)
    local n, m
    for i=1:t
        n = rand(UInt32)
        s = string(n, base = 10)
        m = UInt32(Parsers.parse(Int64, s))
        @assert m == n
    end
    return n
end

(as measured with @btime parseintbase(1000), and @btime parseintparsers(1000))
Since Fortran was about 9% faster than Base Julia, this implies that Fortran would be about 30% slower than Parsers.jl.

@oscardssmith ,

That is just guesswork re: “Since Fortran was about 9% faster than Base Julia, this implies that Fortran would be about 30% slower than Parsers.jl”.

Why don’t you try out an apples-to-apples comparison using Parsers.jl in some Julia caller and the exact same with the Fortran code shown here on the same platform and post your code and results that others, especially me, can learn from and try to reproduce on their workstations.

I made an attempt and I do not see anything remotely close to what you claim, here’s the code and the exact steps that were followed:

  • Julia code: parse.jl file
using Parsers
function parseintParsers(t)
    local n, m
    for i=1:t
        n = rand(UInt32)
        s = string(n, base = 10)
        m = UInt32(Parsers.parse(Int64, s))
        @assert m == n
    end
    return n
end

@time r = parseintParsers(10000000)
println(r)

Notes re: above:

  • Parsers.parse(Int64, s, base = 10) is not supported on my system, don’t know what’s up with that.

  • @btime is not supported on my system; don’t know what’s up with that.

  • Console output with Julia:

C:\temp>type parse.jl
using Parsers
function parseintParsers(t)
    local n, m
    for i=1:t
        n = rand(UInt32)
        s = string(n, base = 10)
        m = UInt32(Parsers.parse(Int64, s))
        @assert m == n
    end
    return n
end

@time r = parseintParsers(10000000)
println(r)

C:\temp>julia parse.jl
  0.576289 seconds (20.00 M allocations: 915.547 MiB, 5.67% gc time)
2960917307

C:\temp>julia parse.jl
  0.559855 seconds (20.00 M allocations: 915.547 MiB, 5.78% gc time)
4130055995

C:\temp>julia parse.jl
  0.565078 seconds (20.00 M allocations: 915.547 MiB, 5.73% gc time)
2339675672
  • Fortran code here.
Click to see console output using gfortran
C:\temp>gfortran -O3 p.f90 -o p.exe

C:\temp>p.exe
               C Fortran   ratio
   times  0.3281  0.2812  0.8571


       #    mean     min     max   first    last maxdiff
10000000    13.1 -499999  499999    4436  346045       0

C:\temp>p.exe
               C Fortran   ratio
   times  0.3438  0.2969  0.8636


       #    mean     min     max   first    last maxdiff
10000000   -40.4 -500000  499999  220004  107439       0

C:\temp>p.exe
               C Fortran   ratio
   times  0.3281  0.2812  0.8571


       #    mean     min     max   first    last maxdiff
10000000  -188.8 -499999  499999  -95792  318196       0
Click to see console output using Intel Fortran
C:\temp>ifort /QxHost /O3 p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.5.0 Build 20211109_000000
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.31.31105.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
               C Fortran   ratio
   times  0.2812  0.1562  0.5556


       #    mean     min     max   first    last maxdiff
10000000    44.4 -499999  499999 -499999   -6839       0

C:\temp>p.exe
               C Fortran   ratio
   times  0.2812  0.1562  0.5556


       #    mean     min     max   first    last maxdiff
10000000    44.4 -499999  499999 -499999   -6839       0

C:\temp>p.exe
               C Fortran   ratio
   times  0.2812  0.1406  0.5000


       #    mean     min     max   first    last maxdiff
10000000    44.4 -499999  499999 -499999   -6839       0

The above, whilst not being strictly apples-to-apples (what’s up with including the random number invocation, int-to-string conversion, and then string-to-int parsing, all in the same test), suggests

  • Parsers.jl solution with Julia is roughly 2X slower than the timings with gfortran
  • Parsers.jl solution with Julia is around 3.5X slower than the timings with Intel Fortran

Julia timing is on the order of 0.56 seconds. gfortran is on the order of 0.28 seconds and Intel Fortran about 0.16 seconds. All on the same platform.

One can try to reorient the Julia script to be equivalent to the OP’s approach with Fortran e.g., separate out the random number calls and the string generation from the timing measurements, but I doubt highly it will overcome the 2X to 3.5X difference plus some to become further competitive than the Fortran example here.

Please try and show where I am going wrong and I would welcome being corrected. But please be precise in your illustrations and no guessing game, alright?

Apologies for the broken code. @btime comes from BenchmarkTools which I just always have loaded, and the other error was just a sloppy copy-paste on my part.

@oscardssmith ,

No worries but going forward please note should anyone embark or remark on timing comparisons - and there are some Julia users who are rather keen on this (“faster than Fortran, more beautiful than Python” is their baneful raison d’etre) - then precision, accuracy, and consistency with codes and instrumentation and observations is an absolute must.

Nothing I have seen with Julia thus far remotely measures up to this and there are writers out there who are imprecise with their claims around their homebrews, it’s entirely unimpressive.

It is a nice example of using C-Fortran interoperability features but comparing speed of READ vs. atoi/atof seems a bit nonsense to me. READ from an internal file could only be compared to sscanf. There is so much overhead in an I/O procedure - format interpretation etc.

2 Likes

Below is a simple enough Julia script in Fortran style that does about the same as the Fortran code in the original post - note the timing measurement is restricted to parsing:

using Parsers

const N = 10_000_000
n = rand(-500_000:499_999, N)
str = string.(n, base=10)

@time n_parsed = Parsers.parse.(Int, str)

@assert n_parsed == n
println(minimum(n,dims=1))
println(maximum(n,dims=1))

On a given workstation the response by Julia is as follows:

C:\temp>Julia parse.jl
  0.190575 seconds (160.08 k allocations: 85.959 MiB, 28.57% compilation time)
[-500000]
[499999]

C:\temp>Julia parse.jl
  0.192144 seconds (160.08 k allocations: 85.959 MiB, 28.82% compilation time)
[-500000]
[499999]

C:\temp>Julia parse.jl
  0.192366 seconds (160.08 k allocations: 85.959 MiB, 28.57% compilation time)
[-500000]
[499999]

Thus it’s around 0.192 seconds. Now note the above script sets up a jagged array of strings such that the leading and trailing blanks are removed.

On the same workstation, using gfortran and doing the same i.e., parsing strings which are left and right adjusted, the timing results are around 0.121 seconds, about 60% faster.

Click to see the Fortran code used with gfortran
! main program
program p

   use timer_m
   use num_m 

   integer, parameter :: N = 10**7
   type(num_t), allocatable :: nums(:)
   integer, allocatable :: ints(:)
   type(timer_t) :: timer
   integer :: i

   allocate( nums(N) )
   allocate( ints(N) )
   call generate( nums )

   call timer%start()
   do i = 1, N
      ints(i) = parse_int( nums(i)%digits )
   end do
   call timer%stop()
   print "(g0,f0.4,g0)", "Parsing time: ", timer%t(), " seconds."

   call assert( nums, ints )
   print *, "First: ", ints(1), nums(1)%n
   print *, "Last:  ", ints(N), nums(N)%n
   print *, "Min:   ", minval(ints, dim=1), nums(minloc(ints, dim=1))%n
   print *, "Max:   ", maxval(ints, dim=1), nums(maxloc(ints, dim=1))%n

end program

! Helper module
module num_m

   type :: num_t
      integer :: n
      character(len=:), allocatable :: digits
   end type

contains

   impure elemental subroutine generate( num )
      type(num_t), intent(inout) :: num
      real :: x
      call random_number( x )
      num%n = 10**6*(x - 0.5 ) ! random integer
      num%digits = repeat(' ', ncopies=10)
      write( num%digits, fmt="(i0)" ) num%n
      num%digits = adjustl( trim(num%digits) )
   end subroutine

   impure elemental subroutine assert( num, n )
      type(num_t), intent(in) :: num
      integer, intent(in)     :: n
      if ( n /= num%n ) then
         print *, num%n, n
         error stop "num /= n" 
      end if
   end subroutine

   ! Simple variant for base 10 of function by @lkedward toward Julia microbenchmark
   elemental function parse_int(s) result(n)

      ! Argument list
      character(len=*), intent(in) :: s
      ! Function result
      integer :: n

      ! Local variables
      integer :: i, d, sign

      n = 0
      sign = 1
      loop_digits: do i = 1, len(s)
         d = ichar(s(i:i))
         select case ( d )
            case ( ichar('0'):ichar('9') )
               d = d - ichar('0')
            case ( ichar('-') )
               sign = -1
               cycle loop_digits
            case ( ichar('+'), ichar(' ') )
               cycle loop_digits
            case default
               error stop "parse error"
         end select
         n = n*10 + d
      end do loop_digits
      n = sign*n

   end function

end module

! Timer module
module kinds_m
   use, intrinsic :: iso_fortran_env, only : I8 => int64
   integer, parameter :: P12 = selected_real_kind( p=12 )
end module
module timer_m
   use kinds_m, only : I8, WP => P12
   private
   real(WP), parameter :: ZERO = 0.0_wp
   type, public :: timer_t
      private
      integer(I8) :: start_tick = 0
      integer(I8) :: end_tick = 0
      integer(I8) :: rate = 0
   contains
      procedure, pass(this) :: start => start_time
      procedure, pass(this) :: stop => end_time
      procedure, pass(this) :: t => get_time
   end type
contains
   impure elemental subroutine start_time( this )
      class(timer_t), intent(inout) :: this
      call system_clock( this%start_tick, this%rate ) 
   end subroutine
   impure elemental subroutine end_time( this )
      class(timer_t), intent(inout) :: this 
      call system_clock( this%end_tick ) 
   end subroutine
   elemental function get_time( this ) result(time)
      class(timer_t), intent(in) :: this
      real(WP) :: time
      time = real(this%end_tick-this%start_tick, kind=kind(time) ) /    &
             real(this%rate, kind=kind(time) ) 
   end function
end module
C:\temp>gfortran -O3 -funroll-loops -march=native p.f90 -o p.exe

C:\temp>p.exe
Parsing time: .1207 seconds.
 First:      -256765     -256765
 Last:        488620      488620
 Min:        -500000     -500000
 Max:         499999      499999

C:\temp>p.exe
Parsing time: .1225 seconds.
 First:       461273      461273
 Last:       -258333     -258333
 Min:        -499999     -499999
 Max:         499999      499999

C:\temp>p.exe
Parsing time: .1212 seconds.
 First:      -179684     -179684
 Last:       -327573     -327573
 Min:        -499999     -499999
 Max:         499999      499999

Hi @FortranFan, (just before Julia users will “chime in”…)

I think the above timing of Julia code includes compilation time, so probably not “apple-to-apple” comparison (although in some cases it may be important to compare the “total” timing for a given task). To confirm this, we can write this line twice:

@time n_parsed = Parsers.parse.(Int, str)
@time n_parsed = Parsers.parse.(Int, str)

then it gives on my old mac-mini 2012 (not M1 yet…):

  0.326672 seconds (168.87 k allocations: 86.603 MiB, 27.18% compilation time)
  0.231162 seconds (5 allocations: 76.294 MiB)

(Here, running the same script twice is not the same as the above two lines, because each run compiles the program again.)
To make it even safer for comparison, I think it would be nice to include all test codes into a function (to avoid pitfalls like global variables, as suggested in their “Performance tips” page)

(function version)
using Parsers

function test()
    N = 10_000_000
    n = rand(-500_000:499_999, N)
    str = string.(n, base=10)

    @time n_parsed = Parsers.parse.(Int, str)

    @assert n_parsed == n
    println(minimum(n,dims=1))
    println(maximum(n,dims=1))
end

test()

Indeed, I think it is sometimes tricky to compare the timing of different languages, particularly Julia (because of JIT compilation etc) and also Fortran (because of performance dependence on compilers/options/platforms/etc)…

1 Like

It is not quite the same. The Julia’s Parsers.parse does error checking on the string being parsed while the Fortran version silently ignores pluses, minuses and spaces inside the string (i.e. between digits).

1 Like

Upon actually trying it out, the error checking for invalid tokens between the digits hardly makes any noticeable timing difference to the gfortran result in my previous comment. I’ll defer posting an edit to the code until someone calls for it.