Rojff 1.0 Release (A JSON Library)

The 1.0 release of rojff is out. It boasts

  • An additional error check: no duplicate keys in objects
  • Simplified error handling: construct your data structure the same way you would without error handling and check it at the very end
  • A full, complimentary set of construction methods:
    • functional style vs move_alloc
    • With error check vs without
  • A test that demonstrate constructing the same data structure with all four methods to make them easier to compare and contrast.
  • Some improvements to the documentation

The documentation, examples and tutorials can always use improvement, so if anybody would be interesting in contributing I’d love the help and be willing to give a thorough walk-through of the library.

8 Likes

Also, I’m looking for a high-performance replacement for

read(number_string, *, iostat=stat) number_d  ! not efficient

that still catches invalid numbers. I previously tried,

interface
    function strtod( str, endptr ) result(d) bind(C, name="strtod" )
        ! <stdlib.h> :: double strtod(const char *str, char **endptr)
        import
        character(kind=c_char,len=1),dimension(*),intent(in) :: str
        type(c_ptr), intent(inout) :: endptr
        real(c_double) :: d
    end function strtod
end interface

endptr = c_null_ptr
number_d = strtod(number_string // c_null_char, endptr)
if ((.not. number_d < 0.0d0) .and. (.not. number_d > 0.0d0)) then
    read(number_string, *, iostat=stat) number_d  ! not efficient - might really be 0.0
else
    stat = 0
end if

But found feeding it something like 1.0.e.1 (clearly not a valid number) did not catch the error and returned the value 1.0, while the plain read did catch the error.

This one line presents an opportunity for huge improvements in performance for large, mainly numeric files.

I usually do the verification separately from extracting the value, especially for reading real numbers where some compilers have less constraints than allowed by the serialization format (like 1.0d0 for example).

Note that strtod is affected by the locale settings and might handle decimal separators differently depending on it, instead strtod_l should be used with a C locale. Unfortunately, there is no strtod_n which allows to pass the length of the character variable rather than relying on a trailing NUL.

Even more reason to desire a more performant, pure Fortran version. Then the semantics are much more easily defined.

By the time I get to that line, I’ve verified the string contains only numeric characters, but not necessarily that it’s a valid number (which is why 1.0.e.1 gets there). I suppose I could implement a more strict parser here, to ensure it is in fact a valid number, but I’m not sure it would necessarily be better performing, and I still have the problem of needing an efficient way of turning the string into a number.

In TOML Fortran’s JSON frontend the verification is implemented in the lexing step:

An integer or float token produced by the lexer is guaranteed to be a valid number. Fortunately, in JSON this token can be read directly via internal IO. In TOML this is actually not the case, see https://github.com/toml-f/toml-f/blob/e5f04f9/src/tomlf/de/lexer.f90#L1276-L1361 for the required cleanup (there is also a naive and imperfect attempt to parse a real value directly).

Those are good examples for some inspiration. Thanks.

For those interested, the parser for numbers in rojff is here:

See here for previous discussions: Faster string to double

I have the same issue with JSON-Fortran. String to double using read is a huge bottleneck.

Thanks @jacobwilliams . I did vaguely recall that discussion, and it was my impetus to try using strtod, but as mentioned, it wasn’t sufficient at detecting errors. I will have to try and reread that thread for other ideas.

1 Like

Have you tried fast_float from Daniel Lemire? It’s a notch faster than older strtod implementations:

Apparently it is included in GCC 12 as the C++ standard library std::from_chars function.

5 Likes

Maybe add to the title and description (of this thread) that it is a JSON library. I didn’t know that from the name.

1 Like