Exactly. stdlib it’s barely 1 year and a half old. It is well known that writing fast code is Fortran is relatively easy, but it’s unreasonable to expect to outperform specialised, mature libraries in such extremely short time with such disparity of resources. All routines in stdlib are marked as experimental, so also the user knows what to expect
I agree that performance benchmarking should be done and that it is important but I disagree that performance should be a show-stopper for the growth and development of stdlib at this point in time. We need to get the ball rolling. A well-designed, much-needed functionality like handling of CSV files, with a nice UI it’s exactly what we need now in stdlib (and Jacob’s implementation delivers exactly that). We can improve later: we’re in experimental mode. “Perfect is the enemy of good”. A practical example of a success story in the same vein is still @jacobwilliams’ JSON-Fortran library. It offered since the beginning a very nice and modern UI (which IMO contributed strongly to its success) and then performance was improved when it matured a bit more (obviously, Jacob could better comment on this). I think that’s the way to go.
Speed is not everything (although it’s undeniably one of the current strong points of Fortran by far). However, to my dismay, I often see in the queue of the HPC facilities at my University jobs running in python, matlab or R (not exactly lightning fast languages). This proves IMO that there is an opportunity and we need to nurture a feature-rich (in its area of specialisation) and user-friendly environment for Fortran users because its simple, yet effective characteristics can be the best choice for science and engineering (not only HPC). Compare the (verbose) simplicity of Fortran with the complexity of C++. I applaud all the efforts of the Fortran-Lang community with fpm, stdlib, LFortran, etc… that all go in that direction.
I know that some people in the Fortran community might disagree with the above point to target Fortran also outside HPC applications but I believe that there is a demand for that and I even planned on presenting such thoughts at FortranCon but, unfortunately, I had to shelve the idea due to my current workload.
I certainly disagree with that. Speed is everything … for FORTRAN. It is the main maybe even sole selling point. What this community wants it to “revive” a language which was once top notch and is now “almost” forgotten. There are at least 5 languages out there which have a package, library and api which is by orders of magnitude better than fortran. How do you wanna explain to potential users that your are slowly catching up, but unfortunately the implemented catch-up facilities are so non-performant that it becomes annoying to use them …
What makes developing in fortran such a pain is the almost total absence of “housekeeping” facilities.
For stdlib I would recommend to focus on only a few projects which implement facilities almost everybody needs when starting a project:
read and write facilities
sort and order algorithms
linked list and trees
sparse matrix containers
using stdlib++ as a template would be a start. If these facilities were really high-performing, every project building on top of that would be high-performant by default
Fortran does not currently have the reputation as the language in which it is easiest and fastest to get data in and out. So if stdlib has a CSV reader, that at least helps with the easy part. Arguably a more serious problem than reading CSV files slowly in Fortran is the greater difficulty in accessing databases, although a few projects address that.
Maybe in addition to pure Fortran solutions in stdlib, hybrid solutions using C++, R, and Python can be created and documented. One can read a large CSV or other file in another language, write it in a binary format, and then read that from Fortran. (Or ideally call another language from Fortran). In my own programs reading large CSV files I have an option where an input file foo.csv is written to foo.bin using an unformatted stream, which is read in future runs, so that I only pay the cost of reading a large CSV once.
I think it’s intended that stdlib will eventually have the functionality of Lapack and the other classic Fortran libraries. I would be more concerned with performance regressions there, since that’s where Fortran has a reputation for speed that it should not lose.
No need to explain (or justify) anything, just read the documentation, look at the README; we are in experimental mode, the project is very young. Manage your expectations. The vast majority of people understands this very easily.
This can be your legitimate speculation. Facts might tell a different story. One example has been provided to you in the past comment where a less-performant, Fortran JSON library has become very popular and only at a later time (as I think it’s reasonable to expect) has been tweaked to become faster.
No need to be sorry: such benchmark is as useful as measuring how faster is an adult at math than a 2 years old.
No need to become offensive @epagone … the timing is the timing … as in many other threads in this forum … just search for them and you’ll be overwhelmed how often posters have wondered about being outperformed by Julia etc. And if you are in experimental mode I recommend more experiments … fread has just set the bar.
And please I would like to know how many people have finally decided to not learn Julia and go for fortran instead just because of the existence of Fortran json . I think self-indulging on these isolated examples blurs the vision … the future is not in making life for existing fortranners more easy (“thanks to xyz I don’t have to write this parser myself”) the future is in having a valid argument for beginners, and currently that is just not there as discussed here
Didn’t realise that it could be interpreted as offensive: it was not meant to be a personal attack (I’ve edited my post to make it 100% clear). I just wanted to show vividly how unreasonable and IMO useless (at this moment) that benchmark is.
WTR to uselessness: I originally started with the 80Mio records file … and read_csv killed my laptop (with 64GB RAM!!), whereas fread reads like a breath. Even without looking at milliseconds that is a scenario no user wants to experience.
In response to a suggestion to @rcs to call it Fortran following the earlier comment by @rcs, “it might be just another nail in FORTARN’s coffin …”:
Re: “start somewhere,” won’t it to have basic etiquette with a name and be accurate in referring to it? In the days of “Code of Conduct” that is primarily predicated on respect of identities and against microaggressions, how acceptable will it be to insist on referring to Julia as Julie instead, or Python as hon, or Zig as zib?
Ah good catch, I will fix that. But that code isn’t used in the parser, so that’s not going to make the parsing faster. Likely, the slowness is due to all the reallocations that happen in the line parsing/tokenize process. There are ways to make that faster.
I will write my personal thoughts on @rcs suggestions regarding performance:
I 100% agree that performance is critical, and we should not compromise on it at all
The goal (for me) is to have the fastest implementations of things in stdlib.
Similarly, LFortran currently has two versions of intrinsic math routines (like sin, tan, …): a version from libc and a pure Fortran version. We will make it so that the user can select a version on the command line, and we can add other implementations also. But the goal of the pure Fortran version is to have the fastest implementation. I think Fortran is the perfect language to write such fast numerical implementations.
In order to get there, we have to start. We focus mainly on the API. So far what is in stdlib does not seem to have an API that would prevent performance. But if it does, we should change it. Everything is marked as experimental.
I agree with @epagone that most people understand that stdlib is young and the routines are experimental.
I use data.table with datasets larger than 10G on a daily basis. data.table has been highly optimized. It is unrealistic to expect the Fortran std csv reader to be as fast as fread at such an early stage.
I typically read datasets into R to use Fortran subroutines. By doing that, I can also utilize R’s great data visualization features. For extending R with Fortran, see my tutorial: