What goes in stdlib and what is separate?

I know that … and once again sorry for having done a benchmark which doesn’t favor read_csv.

2 Likes

Nothing to be sorry about! I totally agree that we should be benchmarking everything. The goal should be that the Fortran version of anything is second to no other implementation.

1 Like

@rcs, apologies for my nit here, but “another nail in FORTARN’s coffin” might be a rather good thing. But with Fortran as its glorious resurrection of course!!

You may know starting with the 90 standard revision, official attempt has been to term it as Fortran.

1 Like

I certainly disagree with that. Speed is everything … for FORTRAN. It is the main maybe even sole selling point. What this community wants it to “revive” a language which was once top notch and is now “almost” forgotten. There are at least 5 languages out there which have a package, library and api which is by orders of magnitude better than fortran. How do you wanna explain to potential users that your are slowly catching up, but unfortunately the implemented catch-up facilities are so non-performant that it becomes annoying to use them …

What makes developing in fortran such a pain is the almost total absence of “housekeeping” facilities.

For stdlib I would recommend to focus on only a few projects which implement facilities almost everybody needs when starting a project:

  • read and write facilities
  • sort and order algorithms
  • linked list and trees
  • hash tables
  • sparse matrix containers
  • etc

using stdlib++ as a template would be a start. If these facilities were really high-performing, every project building on top of that would be high-performant by default

1 Like

I really don’t care … deriving from my programming style I would even prefer frn

I can’t simply because A) it is commerial and B) it is an assembly of many other bits … so its not 5-liner.

What I can do is explain how it works.

Fortran does not currently have the reputation as the language in which it is easiest and fastest to get data in and out. So if stdlib has a CSV reader, that at least helps with the easy part. Arguably a more serious problem than reading CSV files slowly in Fortran is the greater difficulty in accessing databases, although a few projects address that.

Maybe in addition to pure Fortran solutions in stdlib, hybrid solutions using C++, R, and Python can be created and documented. One can read a large CSV or other file in another language, write it in a binary format, and then read that from Fortran. (Or ideally call another language from Fortran). In my own programs reading large CSV files I have an option where an input file foo.csv is written to foo.bin using an unformatted stream, which is read in future runs, so that I only pay the cost of reading a large CSV once.

I think it’s intended that stdlib will eventually have the functionality of Lapack and the other classic Fortran libraries. I would be more concerned with performance regressions there, since that’s where Fortran has a reputation for speed that it should not lose.

1 Like

No need to explain (or justify) anything, just read the documentation, look at the README; we are in experimental mode, the project is very young. Manage your expectations. The vast majority of people understands this very easily.

This can be your legitimate speculation. Facts might tell a different story. One example has been provided to you in the past comment where a less-performant, Fortran JSON library has become very popular and only at a later time (as I think it’s reasonable to expect) has been tweaked to become faster.

No need to be sorry: such benchmark is as useful as measuring how faster is an adult at math than a 2 years old.

No need to become offensive @epagone … the timing is the timing … as in many other threads in this forum … just search for them and you’ll be overwhelmed how often posters have wondered about being outperformed by Julia etc. And if you are in experimental mode I recommend more experiments … fread has just set the bar.

And please I would like to know how many people have finally decided to not learn Julia and go for fortran instead just because of the existence of Fortran json :slight_smile: . I think self-indulging on these isolated examples blurs the vision … the future is not in making life for existing fortranners more easy (“thanks to xyz I don’t have to write this parser myself”) the future is in having a valid argument for beginners, and currently that is just not there as discussed here

Didn’t realise that it could be interpreted as offensive: it was not meant to be a personal attack (I’ve edited my post to make it 100% clear). I just wanted to show vividly how unreasonable and IMO useless (at this moment) that benchmark is.

1 Like

No worries.

WTR to uselessness: I originally started with the 80Mio records file … and read_csv killed my laptop (with 64GB RAM!!), whereas fread reads like a breath. Even without looking at milliseconds that is a scenario no user wants to experience.

Best

Agreed. But to get there we need to start somewhere and (I still believe that) “Perfect is the enemy of good”.

2 Likes

In response to a suggestion to @rcs to call it Fortran following the earlier comment by @rcs, “it might be just another nail in FORTARN’s coffin …”:

Re: “start somewhere,” won’t it to have basic etiquette with a name and be accurate in referring to it? In the days of “Code of Conduct” that is primarily predicated on respect of identities and against microaggressions, how acceptable will it be to insist on referring to Julia as Julie instead, or Python as hon, or Zig as zib?

Yes, and we do, but everybody gets their choice to use the name they prefer, at the risk of being misunderstood. :slightly_smiling_face:

Ah good catch, I will fix that. But that code isn’t used in the parser, so that’s not going to make the parsing faster. Likely, the slowness is due to all the reallocations that happen in the line parsing/tokenize process. There are ways to make that faster.

P.S. This thread is officially off the rails. :slight_smile:

3 Likes

I will write my personal thoughts on @rcs suggestions regarding performance:

  • I 100% agree that performance is critical, and we should not compromise on it at all
  • The goal (for me) is to have the fastest implementations of things in stdlib.
  • Similarly, LFortran currently has two versions of intrinsic math routines (like sin, tan, …): a version from libc and a pure Fortran version. We will make it so that the user can select a version on the command line, and we can add other implementations also. But the goal of the pure Fortran version is to have the fastest implementation. I think Fortran is the perfect language to write such fast numerical implementations.
  • In order to get there, we have to start. We focus mainly on the API. So far what is in stdlib does not seem to have an API that would prevent performance. But if it does, we should change it. Everything is marked as experimental.
  • I agree with @epagone that most people understand that stdlib is young and the routines are experimental.
  • @rcs you mentioned that you would like sparse matrix functionality. Would you have time to help me with this PR: Initial implementation of COO / CSR sparse format by certik · Pull Request #189 · fortran-lang/stdlib · GitHub<. You could help with the benchmarking part if you want.
6 Likes

I use data.table with datasets larger than 10G on a daily basis. data.table has been highly optimized. It is unrealistic to expect the Fortran std csv reader to be as fast as fread at such an early stage.

I typically read datasets into R to use Fortran subroutines. By doing that, I can also utilize R’s great data visualization features. For extending R with Fortran, see my tutorial:

Extend R with Fortran

2 Likes