An additional error check: no duplicate keys in objects
Simplified error handling: construct your data structure the same way you would without error handling and check it at the very end
A full, complimentary set of construction methods:
functional style vs move_alloc
With error check vs without
A test that demonstrate constructing the same data structure with all four methods to make them easier to compare and contrast.
Some improvements to the documentation
The documentation, examples and tutorials can always use improvement, so if anybody would be interesting in contributing I’d love the help and be willing to give a thorough walk-through of the library.
Also, I’m looking for a high-performance replacement for
read(number_string, *, iostat=stat) number_d ! not efficient
that still catches invalid numbers. I previously tried,
interface
function strtod( str, endptr ) result(d) bind(C, name="strtod" )
! <stdlib.h> :: double strtod(const char *str, char **endptr)
import
character(kind=c_char,len=1),dimension(*),intent(in) :: str
type(c_ptr), intent(inout) :: endptr
real(c_double) :: d
end function strtod
end interface
endptr = c_null_ptr
number_d = strtod(number_string // c_null_char, endptr)
if ((.not. number_d < 0.0d0) .and. (.not. number_d > 0.0d0)) then
read(number_string, *, iostat=stat) number_d ! not efficient - might really be 0.0
else
stat = 0
end if
But found feeding it something like 1.0.e.1 (clearly not a valid number) did not catch the error and returned the value 1.0, while the plain read did catch the error.
This one line presents an opportunity for huge improvements in performance for large, mainly numeric files.
I usually do the verification separately from extracting the value, especially for reading real numbers where some compilers have less constraints than allowed by the serialization format (like 1.0d0 for example).
Note that strtod is affected by the locale settings and might handle decimal separators differently depending on it, instead strtod_l should be used with a C locale. Unfortunately, there is no strtod_n which allows to pass the length of the character variable rather than relying on a trailing NUL.
Even more reason to desire a more performant, pure Fortran version. Then the semantics are much more easily defined.
By the time I get to that line, I’ve verified the string contains only numeric characters, but not necessarily that it’s a valid number (which is why 1.0.e.1 gets there). I suppose I could implement a more strict parser here, to ensure it is in fact a valid number, but I’m not sure it would necessarily be better performing, and I still have the problem of needing an efficient way of turning the string into a number.
In TOML Fortran’s JSON frontend the verification is implemented in the lexing step:
An integer or float token produced by the lexer is guaranteed to be a valid number. Fortunately, in JSON this token can be read directly via internal IO. In TOML this is actually not the case, see https://github.com/toml-f/toml-f/blob/e5f04f9/src/tomlf/de/lexer.f90#L1276-L1361 for the required cleanup (there is also a naive and imperfect attempt to parse a real value directly).
Thanks @jacobwilliams . I did vaguely recall that discussion, and it was my impetus to try using strtod, but as mentioned, it wasn’t sufficient at detecting errors. I will have to try and reread that thread for other ideas.