A New JSON Library

Interesting! It will be nice if someone knowledgeable of both Python libraries as well as JSON-Fortran can complete a thorough investigation and provide a summary of the root-cause(s) of the nearly 8X slowness compared to Python ujson.

I would really like to be proven wrong but my hypotheses is 3 reasons as to the slowness of such libraries developed in pure Fortran:

  1. The language standard of Fortran itself needs significant improvements to help enable a vital aspect of scientific and technical computing which is pre and post-processing of data, now there are massive amounts of it. The core number-crunching is important but processing of all the input and program data to get to the number-crunching stage and once crunched, process the results again for all the stakeholders is paramount. The utility in question here, a JSON library for Fortran, is but one part of this. However, circa 2021-22, it’s rather difficult to build a performant library in Fortran compared to the alternatives. The language itself needs to offer a set of facilities to enable such library authoring, I’ve listed my suggestions here. Add the computer science concepts of move semantics and rule of 7 to the list, for this is relevant to how libraries such as JSON-Fortran and rjoff tend to be architected.
  2. Fortran compilers need to really up the game on optimization though it’s a very difficult battle. The other paradigms, especially C++, Python, and Julia, attract the sharpest minds and have tons and tons of them to optimize and optimize their language processors. Fortran needs a lot of catching up here.
  3. Library authors themselves will need to put in tons more effort to further optimize their libraries and eke out every ounce of performance, if they are intent on remaining competitive with other alternatives. This may include perhaps replacing critical sections of their “pure Fortran” code with optimized C and/or assembler pieces; this may apply to Fortran stdlib as well.

Or I may be completely off and may be it is just one or two low-hanging fruits in the Fortran code for these JSON libraries that affect the performance and once those fruits are grabbed and the code improved, the Fortran equivalent becomes similarly fast. As I wrote above, I wouldn’t mind at all if I am wrong on this even as my experience thus far has informed me otherwise.

2 Likes