The counter-intuitive rise of Python in scientific computing

alin · March 24, 2021, 2:22pm

do not tell me about that… every time I am amassed there are still people in audience who do not understand that… and usually I teach phd students…

milancurcic · March 25, 2021, 12:42am

I don’t find that surprising at all. How would beginners to Fortran (and most likely, programming in general) know that before you teach it to them? For a language that works mostly like math, integer division is an exception.

everythingfunctional · March 25, 2021, 12:32pm

If you’ve only ever had a math class, why would you have any reason to doubt that 1/2 equals 0.5? It’s one of those intuitions/assumptions that people just do without realizing that computers can’t. Most people’s intuition would tell them that the type of the result of the expression 4/2 is an integer (2), but the result of the expression 1/2 is a real (0.5), without realizing that’s a intuition the computer doesn’t have.

Fun fact, the doubling rate of computer programmers in the world is roughly 5 years. That means that roughly half of all programmers have less than 5 years of experience. And that is likely to remain the case for another doubling or two (5 to 10 more years) at least.

Beliavsky · March 28, 2021, 7:11pm

The Quest for Speed - An Indicative Survey in Improving Performance implements the elastic-net regularized regression method (a form of variable selection used when there are many predictors) in Numpy, C++, and Fortran. GitHub code

Overall, roughly 2-3 times the speedup can be seen for the largest of cases with the use of Fortran or C++ which is not insignificant. A similar runtime pattern is seen through all languages showing a linear improvement in performance with their use. A closer look is recommended at the speedup sub-plots via the interactive link as they show that with the highest number of observations and dimensions, Fortran and C++ still show a speedup of 2.8x, and 1.95x respectively.

…

Summary

Rather significant improvements have been shown through the use of Fortran and C++ of between 2x-3x for even the largest of cases, with the worst being around 1.5x. The standard operations available within Fortran which helps in scientific computing, its higher performance in this study, and limited additional requirements shows that Fortran should be considered for accelerating EQ , especially with more loop-heavy parts of the code, where, statically-typed languages such as Fortran and C++ can flex their performance advantages.

Beliavsky · March 28, 2021, 11:48pm

Below is a 2019 preprint with a similar theme. Can one write a Fortran 2018 linear algebra wrapping Lapack that is as high level and convenient as Matlab etc., without sacrificing speed? Lapack95 has short calling sequences but cryptic subroutine names, for example

CALL LA_GESV ( A, B, IPIV=ipiv, INFO=info )
Solves a general system of linear equations AX = B.

and one could define an operator and write

X = B .div. A

The Matran project did this but has not been updated recently. The Matran page says pointer arrays will be replaced with allocatables “when the allocatable array extensions to Fortran 95 are widely and correctly implemented.” That happened years ago.

The Linear Algebra Mapping Problem

Christos Psarras, Henrik Barthels, Paolo Bientinesi

We observe a disconnect between the developers and the end users of linear algebra libraries. On the one hand, the numerical linear algebra and the high-performance communities invest significant effort in the development and optimization of highly sophisticated numerical kernels and libraries, aiming at the maximum exploitation of both the properties of the input matrices, and the architectural features of the target computing platform. On the other hand, end users are progressively less likely to go through the error-prone and time consuming process of directly using said libraries by writing their code in C or Fortran; instead, languages and libraries such as Matlab, Julia, Eigen and Armadillo, which offer a higher level of abstraction, are becoming more and more popular. Users are given the opportunity to code matrix computations with a syntax that closely resembles the mathematical description; it is then a compiler or an interpreter that internally maps the input program to lower level kernels, as provided by libraries such as BLAS and LAPACK. Unfortunately, our experience suggests that in terms of performance, this translation is typically vastly suboptimal.
In this paper, we first introduce the Linear Algebra Mapping Problem, and then investigate how effectively a benchmark of test problems is solved by popular high-level programming languages. Specifically, we consider Matlab, Octave, Julia, R, Armadillo (C++), Eigen (C++), and NumPy (Python); the benchmark is meant to test both standard compiler optimizations such as common subexpression elimination and loop-invariant code motion, as well as linear algebra specific optimizations such as optimal parenthesization of a matrix product and kernel selection for matrices with properties. The aim of this study is to give concrete guidelines for the development of languages and libraries that support linear algebra computations.

vmagnin · March 29, 2021, 6:30am

That’s not in the spirit of the original FORmula TRANslating system… Hence indeed a weakness.

instead, languages and libraries such as Matlab, Julia, Eigen and Armadillo, which offer a higher level of abstraction, are becoming more and more popular. Users are given the opportunity to code matrix computations with a syntax that closely resembles the mathematical description;

They are in the spirit of the original Backus’ Fortran…

Beliavsky · March 29, 2021, 2:50pm

I see that the linalg module of fortran-utils by @certik does this. It starts as follows:

module linalg
  use types, only: dp
  use lapack, only: dsyevd, dsygvd, ilaenv, zgetri, zgetrf, zheevd, &
       dgeev, zgeev, zhegvd, dgesv, zgesv, dgetrf, dgetri, dgelsy, zgelsy, &
       dgesvd, zgesvd, dgeqrf, dorgqr, dpotrf, dtrtrs
  use utils, only: stop_error, assert
  use constants, only: i_
  implicit none
  private
  public eig, eigvals, eigh, inv, solve, eye, det, lstsq, diag, trace, &
       svdvals, svd, qr_fact, cholesky, solve_triangular

  ! eigenvalue/-vector problem for general matrices:
  interface eig
     module procedure deig
     module procedure zeig
  end interface eig

  ! eigenvalue/-vector problem for real symmetric/complex hermitian matrices:
  interface eigh
     module procedure deigh_generalized
     module procedure deigh_generalized_values
     module procedure deigh_simple
     module procedure zeigh_generalized
     module procedure zeigh_simple
  end interface eigh

  ! eigenvalues for general matrices:
  interface eigvals
     module procedure deigvals
     module procedure zeigvals
  end interface eigvals

  ! matrix inversion for real/complex matrices:
  interface inv
     module procedure dinv
     module procedure zinv
  end interface inv

  ! solution to linear systems of equation with real/complex coefficients:
  interface solve
     module procedure dsolve
     module procedure zsolve
  end interface solve

  interface solve_triangular
     module procedure dsolve_triangular
  end interface solve_triangular

  ! determinants of real/complex square matrices:
  interface det
     module procedure ddet
     module procedure zdet
  end interface det

  ! least square solutions the real/complex systems of equations of possibly non-square shape:
  interface lstsq
     module procedure dlstsq
     module procedure zlstsq
  end interface lstsq

  ! construction of square matrices from the diagonal elements:
  interface diag
     module procedure ddiag
     module procedure zdiag
  end interface diag

  ! trace of real/complex matrices:
  interface trace
     module procedure dtrace
     module procedure ztrace
  end interface trace

  ! singular values of real/complex matrices:
  interface svdvals
     module procedure dsvdvals
     module procedure zsvdvals
  end interface svdvals

  ! singular value decomposition of real/complex matrices:
  interface svd
     module procedure dsvd
     module procedure zsvd
  end interface svd

  ! Cholesky decomposition
  interface cholesky
     module procedure dcholesky
  end interface cholesky

  ! assert shape of matrices:
  interface assert_shape
     module procedure dassert_shape
     module procedure zassert_shape
  end interface assert_shape

contains

chrisarg · December 26, 2023, 11:24pm

Bingo,
Foreign Function Interfaces that many/all scripting languages offer, means that one can handle data flow management/IO/visualization/database interfacing with something easier to work with, and let Fortran do what it does best.

Pap · December 28, 2023, 7:17pm

Correct, but then again, why such infrastructure of libraries even exist? You can find Python libraries for literally everything, and in most cases it just doesn’t make sense. The exact same thing could be done much faster with a language that at least has a compiler. Don’t ask me why this is happening, it is beyond me. There is no logical explanation. I can only quote the well-known aphorism attributed to Einstein.

There is no debate here, or at least there shouldn’t be any debate. Python is very good for scripting, and that’s it. Otherwise it’s one of the worst choices for scientific computing out there, period. End of the story.
I dislike C, and even more C++, which are obviously not designed for scientific computing. But if I only had two choices, either C/C++ ot Python for such kind of computing, I wouldn’t think a fraction of a second: I would pick C/C++ immediately - or literally anything else but Python. For obvious reasons. How those reasons seem not to be obvious for many others, again, it’s beyond me.

aledinola · December 28, 2023, 11:53pm

I agree. I think also that Julia or even Matlab are much better than Python for scientific computing. A good compromise is to use Julia or Matlab as user-friendly interface and do the heavy numerical calculations in Fortran. That said, Julia is reasonably fast in itself, Python is much slower.

everythingfunctional · December 29, 2023, 12:00am

I think there’s a bit of a conflation here. The resulting code would execute “much faster” with a compiled language, but actually developing the code in the first place would not be “much faster”.

When I was first learning software development I loved Python because the feedback loop could be so much quicker. I could play around with a concept in the REPL, to figure out how I could/should write a little piece of code, write it, then load the module it was in and again just play around with it. No need to go through a write-compile-execute loop, no need for a testing framework, just “play” with the code. For data science type stuff being able to “play” with the data is also hugely beneficial.

I can absolutely see why Python became popular, but I see no reason other languages, even Fortran, couldn’t obtain the same level of interactivity with the right tooling. Haskell is a compiled language that also has an interactive REPL, and it was also hugely helpful when I was learning that language. LFortran has a REPL so it’s certainly possible. I think we could gain a lot of Python converts if we got things working well in this space.

oscardssmith · December 29, 2023, 12:35am

The whole point of Julia is that Julia will function as a user-friendly interface while doing the heavy numeric calculations just as fast as Fortran (or faster if the expressive power gives you time to meta-program a super fancy algorithm).

shahmoradi · December 29, 2023, 2:37am

What people see in practice matters, not mere goals, wishes, and exaggerations.
https://www.reddit.com/r/Julia/comments/18plt1h/machine_learning_frameworks_feel_sluggish_why_is/

Pap · December 29, 2023, 9:45am

That’s correct, however I still think the first “much faster” outbalances the second. Admittedly, I never used Python for a really large project (because as far I know I am still sane,) but I did play with it enough to realize things. Additionally, I played a lot with Lua (I actually wrote a numerical methods package for it years ago) and I certainly played much more with Scilab (which is more “high level” than Python.)

Now, my experience says that, for serious projects at least, it always goes like that: You start with one of those (Scilab or Python or whatever similar) and you have indeed the tools to proceed faster… in the beginning - in the very beginning. If you want to try a new idea, you can have a working (and small) code that can be used to play around with your new idea. You have a plethora of ready-made libraries to use and you don’t even need to save the code, if you are using the REPL. The beginning is full of roses, basically the definition of convenience. So you have a code to use in, let’s say 1-2 hours, sometimes even in just a few minutes. Then you start to play around and see if your idea works. Yes, it might be slower but who cares, all you want is some results to confirm the idea works. Usually your idea doesn’t work right away, the code needs some modifications. And even if it does work right away, it definitely needs polishing - lots of it… Either way, this is when the reality of doing business in such a language strikes.
The code polishing becomes more and more demanding, needing more and more trial and error. And each try is much slower than what it could be. Granted, it might still be “fast enough” generally speaking, but even a one-second running time (which is typical for small projects) becomes tedious when you have to do it again and again. You realize you could do the trial-and-error phase literally orders of magnitude faster, if you used a compiled language. This is when you start thinking “ok, the idea works but the code itself still needs a lot of work, and I definitely won’t do that in this environment, let’s switch to Fortran”. At least that’s what I did several times.

Of course, if we are talking about small projects - or projects that the computational part is not very demanding anyway - you can leave it in Python or whatever else language you are using. I myself have lots of code in Maxima that could be run much faster if it was written in Fortran, but it’s good enough as it is and I don’t need to use it often anyway. So I never felt the need to rewrite the whole thing in Fortran. However, that’s the exception rather than the rule.

For large projects, you will typically need a compiled language sooner or later - and pray it is sooner than later because the more you delay it, the harder it becomes. So my conclusion is, you know what, I will need Fortran anyway, so I’ll start with it right away". Afterall, using Fortran from the very beginning is not that slower. Its array features are unbeatable, I have a pack of libraries backing me up (some written by me, some taken by one of the countless "FOOPACK"s.) The foundation of the new code is already there, it’s not like I am starting from scratch. This is where LFortran’s REPL can help as well, and this is one of the many reasons I like LFortran.

If what you are going to do seems like a serious project, the following inequality for total time needed holds:

time(Fortran right away)  <<  time(whatever interpreted but convenient)

The former has a somewhat slower start, but it pays off soon. The latter usually starts great but you hit a “stall wall” pretty quickly. Of course, many people will disagree. I know people who did their whole Master’s in… Mathematica or Matlab, for Devil’s sake - and it included a hell lot of numerical computations. I tried to ask… “why”, and I quickly realized there is no way to talk them out. The typical answer, disarming in its naivety, was something like “today’s computers are fast enough, no need to bother with Fortran, C/C++, whatever”. Essentially they just avoid learning a real programming language, and rely on a convenient but interpreted and slow environment instead. There is no way to contradict such a “logic”, so nowadays I just nod in comprehension and walk away.

aledinola · December 29, 2023, 11:39am

I don’t agree on the “Julia as fast as Fortran” statement. There is a paper that compares the performance of different programming languages and finds that Julia is about 50% slower than Fortran or C.
The paper is written by two economists who solve a basic real business cycle model (a workhorse model in macroeconomics) using Fortran, C, Matlab, Julia and other programming languages. It’s quite interesting and they find that Fortran and C are the fastest (no significant difference between the two) followed by Julia and then Matlab. Python is much slower. I attach the table with the run time comparison from the paper.
Note: the paper is from 2018, so its results might be a bit outdated

Link to the codes to replicate the paper: GitHub - jesusfv/Comparison-Programming-Languages-Economics: A Comparison of Programming Languages in Economics

oscardssmith · December 29, 2023, 2:17pm

See discussion on the julia discourse from 2016 about this benchmark A Comparison of Programming Languages in Economics - #58 by Elrod - Community - Julia Programming Language. The TLDR is that there were a number of pretty simple changes that give minor speedups and bring Julia on par (to slightly ahead?) of c++.

Any individual benchmark of similar algorithms will show a small benefit to either Julia Fortran or C++ (in the same way that any individual benchmark will show differences between GCC and Clang). These differences are very frequently ~50% and depend heavily on minute differences in the code, random chance (different compiles can randomly shuffle things around leading to cache differences), or how the compiler is feeling on that specific day. What’s more important than which language wins in a specific benchmark is that you can get top notch performance if you try, and get “good” (e.g. within 2x of optimal) performance without heroics.

everythingfunctional · December 29, 2023, 4:04pm

I agree with all that, and it’s part of the reason I ended up sticking with Fortran, but

it takes experience to understand this. Most programmers have less than 5 years experience. So most programmers don’t realise they’re going to hit that wall, so they go with what’s easier to start with.

vmagnin · December 29, 2023, 5:40pm

In the late 80’s, computers were so slow that when you would discover Turbo Pascal, that excellent Borland’s compiler, you would not want to go back to BASIC (although some were good and structured in the same period, like GFA Basic. And there were even compiled BASICs like Turbo Basic.). The difference between interpreted and compiled languages was evident for everybody.

With nowadays computers, you don’t see the difference for small computations. And interpreted languages, especially Python, are often the first language learned. I learned Python around 2010 and really enjoyed it for “scripts” (which are just a name for “small” programs) because it is concise, has structures like lists, and with an easy to use standard library.

But if you want to go to the other side of the Earth, it’s faster to take plane than train… A tool for each application.

Pap · December 29, 2023, 8:16pm

Everybody liked Turbo Pascal, myself included. But B…B… Basic? I learned Assembly for the MOS 6510 just because of Basic. Even the good versions of it (BBC Basic, that is) were abysmally slow.

I would argue here that I don’t see any reason for Python as the first language to learn. Why not Pascal, for example, as it used to be. Everybody forgot that one. It’s not harder to learn; it doesn’t lack features; it has excellent modern versions and compilers. Why not Lua, as another example, if you insist on interpreted languages. And I could mention many others here.
Why Python specifically managed to be so popular that you will find libraries written in it for everything, even for tasks Python is blatantly not suitable? But that’s not the real problem anymore.

The real problem nowadays is many newcomers learn Python and stay to Python. It’s not even “learn this one as an introduction, then switch to C/C++/whatever at some point in the future” anymore. It’s just Python… Python for everything, forever. I know people who do that and are ready to defend it with a passion. I’m not even sure they really know (or believe, or care) they could run their programs 100-1000 times faster if they used a decent programming language that has a compiler. The real problem is Python hits #1 (in TIOBE index, for example). An interpreted language is steadily #1. The Basic of modern era, I guess… Well, my reaction is “no, thank you”, even though some will look at me as if they stare at an exotic weird species instead of a normal person.

vmagnin · December 29, 2023, 8:55pm

I learned programming with BASIC on a Sharp PC-1211. No smartphone for kids in those days… The PC-1211 had a ~1.5 kB RAM and a 4 bits microprocessor at 256 kHz. I respect BASIC as my first step into programming: it was Beginner’s All-purpose Symbolic Instruction Code, and I was an absolute beginner (not yet a teen) and it did the job, introducing me in the computer world. In 1981, Kraftwerk was singing in its Computer World album:

I program my home computer
Beam myself into the future
Kraftwerk – Home Computer Lyrics | Genius Lyrics
https://www.youtube.com/watch?v=Ad-8gOMM9yk

BASIC did beam myself into the future. Well, it also quickly attracted me toward assembly language… (it was easy to learn on those old small microprocessors, and just still more basic than BASIC).

Probably it was taught as a first language because it was interactive and with a simple and powerful syntax (a Python program can typically be 2x or 3x smaller than a C program). And there are even now pocket calculators with Python inside. The problem is that now it is #1, it will be hard to dislodge it before next decade…

When I was a student, Turbo Pascal was taught in France as a first language. Then it mysteriously disappeared despite Delphi (a great visual Pascal), which was quite successful but probably less than MS Visual Basic.

Probably Python became #1 due to a beam of causes, some of them accidental (like mammals replaced dinosaurs…).

Topic		Replies	Views
A Modern Fortran Scientific Programming Ecosystem Announcements	8	1857	October 17, 2022
Discussion of Fortran on Hacker News	29	2732	January 6, 2024
Scientific Fortran Lectures Advocacy	5	964	June 5, 2024
Parallel Programming with Coarrays in Fortran (blog post)	5	753	April 1, 2024
New Fortran programmer saying hi (and some thoughts on advocacy)	18	1068	June 2, 2024

The counter-intuitive rise of Python in scientific computing

Summary

The Linear Algebra Mapping Problem

Related topics