On the beginner-friendliness of scientific libraries

The title is a bit provocative, and I’m not as upset as it might sound. I enjoying lurking on this forum and others to see what cool things people make and do. However, I almost invariably run into a particular stumbling block.

The description for the cool library almost always starts with:

LibTimeTwister solves the Flux Capacitor Equations using the Doc Brown Numerical Integration Scheme, which is a variant of the Spock “Take A Guess” Half Stepping Integration Method. It is robust and optimized to run on GPUs with a user-friendly interface. Also supported are the Wibbly-Wobbly Timey-Whiney TARDIS temporal mechanics equations on an unstructured temporal-spatial grid.

After that is instructions on how to build it, with maybe a few examples, a link to the reference material and a license.

Sounds cool, right? Unfortunately for me, I have no background in temporal mechanics. This is the first I’ve heard of it. I want to learn more! Well, wikipedia either is a stub or an overcomplicated morass of math hieroglyphs that I cannot understand. Maybe a blog post here or a blog post there. I find a $200 textbook - don’t have time or money for that.

So, my plea is for libraries that implement cool stuff is to try, to some degree, to explain what the cool stuff is. I understand we can’t all write textbooks with our libraries. We have to start assuming our audience has some level of understanding. I’m just asking for that level to be a little less than “already an expert-mode”.

As a specific example, I ran into this this week while looking at the recently posted rklib. I’m not picking on it. Obviously a lot of time and effort went into a great and comprehensive library. As I said, I run into this all the time, with all sorts of software projects.

My background with Runge-Kutta is the standard 4th order 5 stage scheme that I learned in school. I see that one in rklib. Cool! But what are all these other ones? I want to integrate the Generalized Lotka-Volterra Equations for a toy project. Would a different one be better? What do these acronyms mean? The only one I recognize is TVD, (and even then I’m not really sure I know what it means). Fixed step or variable step? What does it mean to integrate to an event? What is an event? Why do we need to find roots? Roots of what? How do I choose one method over another?

This is an amazing project, but I don’t know how to learn what I need to know in order to use it well. Part of that is that is just who I am. I don’t like to use something I don’t comprehend to a large degree. I can’t copy-paste something off stackoverflow without understanding what it does. At some point I have to give up and do the best I can.

Just food for thought: please think about the little people who think what you people do is awesome.

8 Likes

I understand your frustration, but there are really good reasons for the situation you describe.

The first one is the limited resources. Scientific libraries are developed be scientists, that are under stress of securing a next (and hopefully) permanent position. They are overloaded with teaching and administrative duties. The funding is given for breakthrough discoveries and not for maintenance and writing documentations.

Given this fact, it becomes obvious that only a limited set of absolutely essential libraries are supported: LAPACK, ScaLAPACK, FFTW, GSL. If you want a well documented RK solver, you should probably start with GSL.

Concering your questions like “Fixed step or variable step?” This is technical experience that requires time to acquire. If you are working on a scientific project, this is a part of the work. It is actually your duty to study, not the job of library developers to give popular lectures on it.

2 Likes

In an ideal world where people would have 36-hour days, a documentation following the Diátaxis framework would provide everything you need in four quadrants: tutorials, how-to guides, technical reference and explanation.

The world is not ideal, and we can just try to put ourselves at the beginning of the asymptote…

6 Likes

Many scientific codes attack very specific problems which appeal to a small community and inherently require some mastery. Of course, trying to refer the user to the right resources is the first step, but said resources usually involve, as you said:

  • A lengthy $200 monograph.
  • A collection of 5-10 papers on the topic.
  • In a few cases, an introductory level textbook with explanations.

This, in tandem with the high saturation of the researchers makes extremely hard to write libraries which are both functional, user friendly, and beginner friendly.

1 Like

The Diátaxis framework is very interesting! I knew I couldn’t be the only one thinking about this stuff. I’ll take a deeper look at it. At $DAYJOB I have written some documentation, and yes, it’s really hard. I’ve really felt the quote:

“If I had more time, I would have written a shorter letter.” – attribution unknown

I don’t blame the authors of said scientific libraries or other software projects in the slightest. My frustration stems mostly from, “How can I be as amazing as these people?” (especially now that I’m not in school anymore). Hopefully this post brings those in my situation to mind, just a little. :grinning:

1 Like

Exactly why I didn’t try to get a job in academia…but I still think it’s cool! Hats off to all those who do it. I’m not enough of a salesman for that job. But maybe as a full-time lecturer…I could do that.

The GSL seems to be a very comprehensive library! Thanks for pointing it out.

I think it may have been a post by @melissawm that introduced it to the Fortran-Lang community.

It is what matplotlib follows if you look at the icons on the front page: https://matplotlib.org/

2 Likes

In its workflow page, you will find that interesting process to evolve your doc:

1 Like

Ouch!

Why use C code that has been converted from Fortran code, when you can just use Fortran code? :slight_smile:

GSL has a handful of RK solvers, RKLIB has dozens, and can be trivially incorporated into an existing Fortran project by using FPM.

RKLIB author there.

I can’t really answer the questions of why use one vs another. That depends on your problem. RKLIB is intended to provide as many RK methods as possible, so if you need one it is there. If you only need RK45 and that works for you, then use that.

Event finding means to integrate the equations until some condition is reached. This is fundamentally a root finding problem. For example, integrate the equations of motion of a falling object until it hits the ground. You have to locate the event (say, the altitude = 0) and stop the integration there.

@jacobwilliams thanks for the answers! My questions were more rhetorical in this context, but they are questions that came to me as I read the documentation. RKLIB is extremely thorough, and I very much appreciate the effort you put into it.

Your description of an event makes sense. It’s not a situation I’ve ever encountered myself, but I can understand why someone would want that.

1 Like

I can attest it was trivially easy to add to my toy project with FPM.

1 Like

It is hard because we don’t understand what the user needs and which structure should have a doc, and how to start. Diátaxis is answering those questions. And when you have understood, it is more fun to write a doc!

Most of the time, you will find libraries where the “Reference” quadrant is overwhelming, as it can be (relatively) easily automatically generated by tools like FORD or Doxygen. And the poor beginner feels overwhelmed… (three big letters are falling on his/her head: A P I :boom:)

And writing documentation can be undervalued and coding considered to be more important. But not always, for example in the FreeBSD community people writing doc are considered as important as those coding, and that is why the FreeBSD is very good: