Rojff 1.0 Release (A JSON Library)

everythingfunctional · September 27, 2022, 7:15pm

The 1.0 release of rojff is out. It boasts

An additional error check: no duplicate keys in objects
Simplified error handling: construct your data structure the same way you would without error handling and check it at the very end
A full, complimentary set of construction methods:
- functional style vs move_alloc
- With error check vs without
A test that demonstrate constructing the same data structure with all four methods to make them easier to compare and contrast.
Some improvements to the documentation

The documentation, examples and tutorials can always use improvement, so if anybody would be interesting in contributing I’d love the help and be willing to give a thorough walk-through of the library.

everythingfunctional · September 27, 2022, 7:48pm

Also, I’m looking for a high-performance replacement for

read(number_string, *, iostat=stat) number_d  ! not efficient

that still catches invalid numbers. I previously tried,

interface
    function strtod( str, endptr ) result(d) bind(C, name="strtod" )
        ! <stdlib.h> :: double strtod(const char *str, char **endptr)
        import
        character(kind=c_char,len=1),dimension(*),intent(in) :: str
        type(c_ptr), intent(inout) :: endptr
        real(c_double) :: d
    end function strtod
end interface

endptr = c_null_ptr
number_d = strtod(number_string // c_null_char, endptr)
if ((.not. number_d < 0.0d0) .and. (.not. number_d > 0.0d0)) then
    read(number_string, *, iostat=stat) number_d  ! not efficient - might really be 0.0
else
    stat = 0
end if

But found feeding it something like 1.0.e.1 (clearly not a valid number) did not catch the error and returned the value 1.0, while the plain read did catch the error.

This one line presents an opportunity for huge improvements in performance for large, mainly numeric files.

awvwgk · September 27, 2022, 8:00pm

I usually do the verification separately from extracting the value, especially for reading real numbers where some compilers have less constraints than allowed by the serialization format (like 1.0d0 for example).

Note that strtod is affected by the locale settings and might handle decimal separators differently depending on it, instead strtod_l should be used with a C locale. Unfortunately, there is no strtod_n which allows to pass the length of the character variable rather than relying on a trailing NUL.

everythingfunctional · September 27, 2022, 8:37pm

Even more reason to desire a more performant, pure Fortran version. Then the semantics are much more easily defined.

By the time I get to that line, I’ve verified the string contains only numeric characters, but not necessarily that it’s a valid number (which is why 1.0.e.1 gets there). I suppose I could implement a more strict parser here, to ensure it is in fact a valid number, but I’m not sure it would necessarily be better performing, and I still have the problem of needing an efficient way of turning the string into a number.

awvwgk · September 27, 2022, 8:48pm

In TOML Fortran’s JSON frontend the verification is implemented in the lexing step:

github.com

toml-f/jonquil/blob/9335479/src/jonquil/lexer.f90#L266-L327


      
          !> Process next number token, can produce either integer or floats
          subroutine next_number(lexer, token)
             !> Instance of the lexer
             type(json_lexer), intent(inout) :: lexer
             !> Current token
             type(toml_token), intent(inout) :: token
          
             integer :: prev, pos, point, expo
             logical :: minus, okay, zero, first
             character(1, tfc) :: ch
             integer, parameter :: offset(*) = [0, 1, 2]
          
             prev = lexer%pos
             pos = lexer%pos
             token = toml_token(token_kind%invalid, prev, pos)
          
             point = 0
             expo = 0
             zero = .false.
             first = .true.

This file has been truncated. show original

An integer or float token produced by the lexer is guaranteed to be a valid number. Fortunately, in JSON this token can be read directly via internal IO. In TOML this is actually not the case, see https://github.com/toml-f/toml-f/blob/e5f04f9/src/tomlf/de/lexer.f90#L1276-L1361 for the required cleanup (there is also a naive and imperfect attempt to parse a real value directly).

everythingfunctional · September 27, 2022, 9:23pm

Those are good examples for some inspiration. Thanks.

everythingfunctional · September 27, 2022, 9:26pm

For those interested, the parser for numbers in rojff is here:

jacobwilliams · September 27, 2022, 10:57pm

See here for previous discussions: Faster string to double

I have the same issue with JSON-Fortran. String to double using read is a huge bottleneck.

everythingfunctional · September 28, 2022, 12:52am

Thanks @jacobwilliams . I did vaguely recall that discussion, and it was my impetus to try using strtod, but as mentioned, it wasn’t sufficient at detecting errors. I will have to try and reread that thread for other ideas.

ivanpribec · September 28, 2022, 1:02am

Have you tried fast_float from Daniel Lemire? It’s a notch faster than older strtod implementations:

Apparently it is included in GCC 12 as the C++ standard library std::from_chars function.

Sideboard · September 28, 2022, 9:29am

Maybe add to the title and description (of this thread) that it is a JSON library. I didn’t know that from the name.

Topic		Replies	Views
A New JSON Library Announcements	47	3859	November 8, 2021
Project Strings: improving strings support \|\| Blog Post #6 \|\| GSoC GSoC-2021	0	388	July 2, 2021
GSoC: Linked List \|\| Blog post by Chetan Karwa \|\| #12 GSoC-2021	4	501	August 17, 2021
The Fortran stdlib project has garnered over 1000 stars on GitHub!	6	642	July 2, 2024
Fortran stdlib release v0.4.0	2	402	March 25, 2024

Rojff 1.0 Release (A JSON Library)

Related topics