What goes in stdlib and what is separate?

rcs · August 27, 2021, 1:01pm

I know that … and once again sorry for having done a benchmark which doesn’t favor read_csv.

jacobwilliams · August 27, 2021, 1:02pm

Nothing to be sorry about! I totally agree that we should be benchmarking everything. The goal should be that the Fortran version of anything is second to no other implementation.

FortranFan · August 27, 2021, 1:05pm

@rcs, apologies for my nit here, but “another nail in FORTARN’s coffin” might be a rather good thing. But with Fortran as its glorious resurrection of course!!

You may know starting with the 90 standard revision, official attempt has been to term it as Fortran.

rcs · August 27, 2021, 1:14pm

I certainly disagree with that. Speed is everything … for FORTRAN. It is the main maybe even sole selling point. What this community wants it to “revive” a language which was once top notch and is now “almost” forgotten. There are at least 5 languages out there which have a package, library and api which is by orders of magnitude better than fortran. How do you wanna explain to potential users that your are slowly catching up, but unfortunately the implemented catch-up facilities are so non-performant that it becomes annoying to use them …

What makes developing in fortran such a pain is the almost total absence of “housekeeping” facilities.

For stdlib I would recommend to focus on only a few projects which implement facilities almost everybody needs when starting a project:

read and write facilities
sort and order algorithms
linked list and trees
hash tables
sparse matrix containers
etc

using stdlib++ as a template would be a start. If these facilities were really high-performing, every project building on top of that would be high-performant by default

rcs · August 27, 2021, 1:16pm

I really don’t care … deriving from my programming style I would even prefer frn

rcs · August 27, 2021, 1:21pm

I can’t simply because A) it is commerial and B) it is an assembly of many other bits … so its not 5-liner.

What I can do is explain how it works.

Beliavsky · August 27, 2021, 1:30pm

Fortran does not currently have the reputation as the language in which it is easiest and fastest to get data in and out. So if stdlib has a CSV reader, that at least helps with the easy part. Arguably a more serious problem than reading CSV files slowly in Fortran is the greater difficulty in accessing databases, although a few projects address that.

Maybe in addition to pure Fortran solutions in stdlib, hybrid solutions using C++, R, and Python can be created and documented. One can read a large CSV or other file in another language, write it in a binary format, and then read that from Fortran. (Or ideally call another language from Fortran). In my own programs reading large CSV files I have an option where an input file foo.csv is written to foo.bin using an unformatted stream, which is read in future runs, so that I only pay the cost of reading a large CSV once.

I think it’s intended that stdlib will eventually have the functionality of Lapack and the other classic Fortran libraries. I would be more concerned with performance regressions there, since that’s where Fortran has a reputation for speed that it should not lose.

epagone · August 27, 2021, 1:54pm

No need to explain (or justify) anything, just read the documentation, look at the README; we are in experimental mode, the project is very young. Manage your expectations. The vast majority of people understands this very easily.

This can be your legitimate speculation. Facts might tell a different story. One example has been provided to you in the past comment where a less-performant, Fortran JSON library has become very popular and only at a later time (as I think it’s reasonable to expect) has been tweaked to become faster.

No need to be sorry: such benchmark is as useful as measuring how faster is an adult at math than a 2 years old.

rcs · August 27, 2021, 2:28pm

No need to become offensive @epagone … the timing is the timing … as in many other threads in this forum … just search for them and you’ll be overwhelmed how often posters have wondered about being outperformed by Julia etc. And if you are in experimental mode I recommend more experiments … fread has just set the bar.

And please I would like to know how many people have finally decided to not learn Julia and go for fortran instead just because of the existence of Fortran json . I think self-indulging on these isolated examples blurs the vision … the future is not in making life for existing fortranners more easy (“thanks to xyz I don’t have to write this parser myself”) the future is in having a valid argument for beginners, and currently that is just not there as discussed here

epagone · August 27, 2021, 3:33pm

Didn’t realise that it could be interpreted as offensive: it was not meant to be a personal attack (I’ve edited my post to make it 100% clear). I just wanted to show vividly how unreasonable and IMO useless (at this moment) that benchmark is.

rcs · August 27, 2021, 3:42pm

No worries.

WTR to uselessness: I originally started with the 80Mio records file … and read_csv killed my laptop (with 64GB RAM!!), whereas fread reads like a breath. Even without looking at milliseconds that is a scenario no user wants to experience.

Best

epagone · August 27, 2021, 3:47pm

Agreed. But to get there we need to start somewhere and (I still believe that) “Perfect is the enemy of good”.

FortranFan · August 27, 2021, 4:01pm

In response to a suggestion to @rcs to call it Fortran following the earlier comment by @rcs, “it might be just another nail in FORTARN’s coffin …”:

Re: “start somewhere,” won’t it to have basic etiquette with a name and be accurate in referring to it? In the days of “Code of Conduct” that is primarily predicated on respect of identities and against microaggressions, how acceptable will it be to insist on referring to Julia as Julie instead, or Python as hon, or Zig as zib?

milancurcic · August 27, 2021, 4:14pm

Yes, and we do, but everybody gets their choice to use the name they prefer, at the risk of being misunderstood.

jacobwilliams · August 27, 2021, 4:40pm

Ah good catch, I will fix that. But that code isn’t used in the parser, so that’s not going to make the parsing faster. Likely, the slowness is due to all the reallocations that happen in the line parsing/tokenize process. There are ways to make that faster.

P.S. This thread is officially off the rails.

certik · August 27, 2021, 6:13pm

I will write my personal thoughts on @rcs suggestions regarding performance:

I 100% agree that performance is critical, and we should not compromise on it at all
The goal (for me) is to have the fastest implementations of things in stdlib.
Similarly, LFortran currently has two versions of intrinsic math routines (like sin, tan, …): a version from libc and a pure Fortran version. We will make it so that the user can select a version on the command line, and we can add other implementations also. But the goal of the pure Fortran version is to have the fastest implementation. I think Fortran is the perfect language to write such fast numerical implementations.
In order to get there, we have to start. We focus mainly on the API. So far what is in stdlib does not seem to have an API that would prevent performance. But if it does, we should change it. Everything is marked as experimental.
I agree with @epagone that most people understand that stdlib is young and the routines are experimental.
@rcs you mentioned that you would like sparse matrix functionality. Would you have time to help me with this PR: Initial implementation of COO / CSR sparse format by certik · Pull Request #189 · fortran-lang/stdlib · GitHub<. You could help with the benchmarking part if you want.

fortran4r · January 6, 2022, 4:44pm

I use data.table with datasets larger than 10G on a daily basis. data.table has been highly optimized. It is unrealistic to expect the Fortran std csv reader to be as fast as fread at such an early stage.

I typically read datasets into R to use Fortran subroutines. By doing that, I can also utilize R’s great data visualization features. For extending R with Fortran, see my tutorial:

Extend R with Fortran

Topic		Replies	Views
Two questions about stdlib Help	3	694	September 16, 2021
Some reflections about stdlib, where it stands and what it could become	58	1074	March 27, 2025
Idea: Splitting stdlib into smaller separate libraries	17	1115	April 13, 2022
Who is using stdlib?	20	2359	October 9, 2022
First release of the Fortran standard library Announcements	49	4469	November 10, 2021

What goes in stdlib and what is separate?

Related topics