A New JSON Library

:+1:

I found the gprof(1) output from a gfortran(1) build interesting:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 87.51      0.07     0.07  2473280     0.00     0.00  __json_value_module_MOD_json_value_reverse
 12.50      0.08     0.01   167178     0.00     0.00  __json_value_module_MOD_json_value_add_member
  0.00      0.08     0.00  2473280     0.00     0.00  __json_value_module_MOD_pop_char
  0.00      0.08     0.00   222252     0.00     0.00  __json_value_module_MOD_push_char
  0.00      0.08     0.00   167179     0.00     0.00  __json_value_module_MOD_parse_value
  0.00      0.08     0.00   167178     0.00     0.00  __json_value_module_MOD_json_info
  0.00      0.08     0.00   111126     0.00     0.00  __json_value_module_MOD_parse_number
  0.00      0.08     0.00   111080     0.00     0.00  __json_string_utilities_MOD_string_to_real
  0.00      0.08     0.00   111080     0.00     0.00  __json_value_module_MOD_string_to_dble
  0.00      0.08     0.00   111080     0.00     0.00  __json_value_module_MOD_to_real
  0.00      0.08     0.00    56045     0.00     0.00  __json_value_module_MOD_to_array
  0.00      0.08     0.00       46     0.00     0.00  __json_string_utilities_MOD_string_to_integer
  0.00      0.08     0.00       46     0.00     0.00  __json_value_module_MOD_string_to_int
  0.00      0.08     0.00       46     0.00     0.00  __json_value_module_MOD_to_integer
  0.00      0.08     0.00       12     0.00     0.00  __json_string_utilities_MOD_unescape_string
  0.00      0.08     0.00       12     0.00     0.00  __json_value_module_MOD_parse_string
  0.00      0.08     0.00        4     0.00     0.00  __json_value_module_MOD_to_object
  0.00      0.08     0.00        4     0.00     0.00  __json_value_module_MOD_to_string
  0.00      0.08     0.00        2     0.00    40.00  __json_value_module_MOD_parse_array
  0.00      0.08     0.00        2     0.00     0.00  __json_value_module_MOD_parse_object
  0.00      0.08     0.00        1     0.00     0.00  __json_file_module_MOD_json_file_failed
  0.00      0.08     0.00        1     0.00    80.01  __json_file_module_MOD_json_file_load
  0.00      0.08     0.00        1     0.00     0.00  __json_value_module_MOD_json_clear_exceptions
  0.00      0.08     0.00        1     0.00     0.00  __json_value_module_MOD_json_failed
  0.00      0.08     0.00        1     0.00     0.00  __json_value_module_MOD_json_initialize
  0.00      0.08     0.00        1     0.00     0.00  __json_value_module_MOD_json_parse_end
  0.00      0.08     0.00        1     0.00    80.01  __json_value_module_MOD_json_parse_file
  0.00      0.08     0.00        1     0.00     0.00  __json_value_module_MOD_json_prepare_parser

Wait, what code did you run that generated this? json_value_reverse shouldn’t be called at all for just parsing a file.

I was running various app codes as a quick view of where time was spent and got called off onto something else and also was seeing a bug that seems to have creep into the version of fpm as well that shows up with your code (using the latest version, which I just rebuilt if I use “fpm run” I just see “app app app app”. So back and I see you probably wanted me to run something like

MYBUILD='--profile release --flag -p'
fpm build $MYBUILD
 fpm run json_fortran_test $MYBUILD
gprof $(fpm run json_fortran_test $MYBUILD --runner) >gprof.out
(more||less) <gprof.out
exit

which I still think is more on target now that I have taken a bit of time to look at the new version. Not sure what platform you have or if you use gprof(1), which is a bit of an art as well as a bit of science but if not, give that a try. Will do that a bit more rigorously if you find the results useful.

I started an fpm plug-in that I had not finished that I might use this code to polish off:

NAME
  fpm-time(1) - call fpm(1) with gprof(1) to generate a flat timing profile
SYNOPIS
  fpm-time [subcommand] [--target] targets
DESCRIPTION
  Run the fpm(1) command with the gfortran(1) compiler and compiler flags
  required to build instrumented programs which will generate gprof(1)
  output files. Run the program and then run a basic gprof(1) command
  on each output.

  IMPORTANT: ONE target program should be selected if multiple targets exist.

  NOTE: 2021-03-21

     This is a prototype plug-in for fpm(1), which is currently in alpha
     release. It may require changes at any time as a result.

OPTIONS
   subcommand  fpm(1) subcommand used to run a program (test,run). If
               no options are specified the default is "test".
               The name "example" will be converted to "run --example"
               internally.
   --targets   which targets to run. The default is "*". ONE target should
               be tested
   --flag      ADDITIONAL flags to add to the compile
   --repeat,R  number of times to execute the program. Typically, this helps
               reduce the effects of I/O buffering and other factors that can
               skew results. Defaults to one execution.
   --help      display this help and exit
   --version   output version information and exit

EXAMPLE
   # in the parent directory of the fpm(1) project
   # (where "fpm.toml" resides).

    fpm-time
    fpm-time run demo1 demo2

SEE ALSO
    gprof(1), gcov(1)

I started that in March. Maybe time to finish it :blush:

If I finish it, if your default test is in the test directory you just run

fpm time

and get a profile run of your test, started the same for gcov(1) too. Also want to extend it to other tools like valgrind(1) and other tools supplied with compilers.

2 Likes

Ah interesting. Yes, I can duplicate this. Thanks!

Something is definitely wrong in the Gprof results. The reverse routine isn’t called for parsing. When I just comment it out completely and rerun Gprof, then it says some other uncalled routine is at the top. So, it is getting confused somehow… Is it a bug?

I noticed that this canada.json file is mostly real numbers. It seems most of the time is spent converting the strings to reals. I haven’t checked jsonff, but in JSON-Fortran, I’m just using:

read(str,fmt=*,iostat=ierr) rval

I notice when I just replace this with

rval = 0.0_RK
ierr = 0

Then the parse time goes down to about 0.05 seconds. So clearly, there is room for improvement here. Is there a faster string to real parser out there for Fortran? Hmmm… maybe I’ll make a new post about this so as not to hijack this thread any more.

2 Likes

Awesome, yes, we might need to write our own string to real converter.

Another benchmark. For the 6.5 MB file big.json that isn’t just real numbers (e.g., it has a lot of string data):

Fortran:

rojff        : 1.5498  seconds
fson         : 0.9193  seconds
json_fortran : 0.2063  seconds

Python:

rapidjson    : 0.045112584 seconds
json         : 0.033147166 seconds
ujson        : 0.021337875 seconds
3 Likes

Thanks to all of you for taking a look and running some benchmarks. For some reason I thought I was a bit closer performance wise. Guess we’ve got some work to do.

As for ideas about where the bottlenecks might be. From what I’ve heard, and to some extent experienced myself, the Fortran library code for reading/writing numeric data is… shall we say not the fastest. And since the canada.json is a lot of numeric data, I suspect that is taking a lot of the time.

Another thought, my file_cursor_t is reading the file one character at a time. Perhaps implementing some sort of buffering would illicit some improvements?

As for the overhead due to error handling, I’d think the branch prediction on modern processors ought to alleviate a lot of that. I’d be curious to know if anybody would know of any way to confirm or deny that though.

I’m happy to take contributions if anybody would be interested. I’d be interested to hear thoughts on the API as well.

That’s very interesting that the builtin json parser in Python can beat rapidjson. My experience has been that rapidjson is one of the fastest.

I think we should experiment with writing a JSON parser that assumes valid JSON and just parses it as quickly as it can. I wouldn’t even worry about representing it at first (nor error handling), just parse it, and perhaps just count how many {} pairs there are. And see if we can get competitive. Then we can add error handling and representing it in Fortran.

Yep, take a look at JSON-Fortran. There is some stuff in there to make the file read go faster (e.g., using STREAM, and also reading it in chunks rather than one character at a time). But, I have to ask: why do you not just use JSON-Fortran? :slight_smile:

1 Like

I didn’t look real hard at it, but my first impressions were that the API didn’t seem that friendly, and the documentation/tutorial wasn’t that illuminating. I tried reading through the source code a bit, because I was curious how you implemented the parser, but had a hard time finding my way around. I never did find where the actual logic for the parser started. So I’ll admit that to some extent my library was born out of Not Invented Here, but I was more interested in the usability aspect than performance, at least initially.

And I will say that rojff is fast enough to be usable, if not necessarily the fastest.

Why not provide Fortran “bindings” for UltraJSON aka ujson and “call it a day”?!

After all, “UltraJSON is an ultra fast JSON encoder and decoder written in pure C” That it has Python bindings is beside the point.

1 Like

The Python interface (it appears at least), returns and accepts native Python data types (i.e. dict, list, string, float, bool). What types should be accepted and returned in Fortran. We don’t have an intrinsic dictionary type, and can’t put different types in an array. Parsing JSON data really fast is great, but once I’ve parsed it, I need to be able to do something useful with it, and that shouldn’t require jumping through hoops or circumventing the type system.

Also, Python has a garbage collector. Presumably C is allocating some memory, how and when do you deallocate that on the Fortran side? Hopefully you’re not making it easy for the user to forget to do that, or have to do it manually at all. That’s how memory leaks happen.

You skipped the first part:

Fast is better than slow
Slow is better than unmaintainable

So the main recommended JSON library in Fortran must be fast, which is better than if it was slow, which would be better than unmaintainable.

2 Likes

Surely it isn’t that bad! :slight_smile: The parser starts in json_parse_file. It’s a recursive parser, mostly inherited from fson with some updates by me to make it faster. The underlying structure is a linked list of pointers.

But, yes, probably the documentation could be improved. The code is well commented, and it generates nice Ford docs: json_file – JSON-Fortran. (I think I expose public and private methods, so that might be part of the problem). Also probably some more tutorials on how to do certain things would be beneficial. I also am happy to accept contributions. JSON-Fortran is definitely production code and we use it not only for reading JSON files, but also for creating and manipulating the data in memory and writing files, as well as for data exchange among tools (e.g., Python and Fortran).

2 Likes

I’ll look at ujson. My experiments with strtod are very promising… so stay tuned…

This is true. I just follow

Make it work
Make it right
Make it fast

I’m in the middle of the “Make it fast” part. And I wasn’t that far off of fast :stuck_out_tongue:

3 Likes

I’m sure it’s not that bad. But I partly wanted the excuse to go through the fun exercise of writing my own anyway, so I didn’t try as hard as I could have. I’d be interested to see some example usage.

1 Like