Fortranlang: a Python package?

I have an idea to create a fortranlang python package (name TBD). The idea would to be to make high performance Python wrappers to some Fortran libraries, including Minpack, ODEPACK, and QUADPACK.

You might be thinking… "scipy already wraps these libraries???". Yes, Scipy does, but an issue with Scipy is that their wrappers are pretty slow, and can not be called within numba compiled functions (numba is a python compiler). It is possible to make much faster python wrappers using ctypes (although these wrappers are less flexible). These wrappers can be called from within numba compiled functions. Therefore, you can get massive speedups over scipy.

For example, I’ve put together the numbalsoda python package, which wraps the LSODA routine in ODEPACK. Right now the package uses a C++ version of LSODA because ODEPACK isn’t modernized quite yet (see progress by @jacobwilliams here). But I’d like to switch to a Fortran once a modernized ODEPACK is available. numbalsoda is up to ~100x faster than scipy’s wrappers of LSODA (benchmark here).

I think this would be (1) a very useful and popular Python package and (2) raise awareness in the scientific community for how useful Fortran can be.

Wondering people’s thoughts! All brainstorming welcome.

7 Likes

I tend to prefer smaller, well focused libraries. But perhaps that’s not a major concern in Python? And perhaps the *PACK libraries aren’t all that well focused to begin with? Mostly just food for thought, but I wouldn’t initially jump to combining multiple Fortran libraries into a single Python package.

1 Like

Very nice!

Eh, I mean, are you saying like for small ODE system, like just 3 ODEs or so, the C version LSODA is faster than the Fortran’s ODEPACK version of LSODA by 100x?
I mean if both with -O3 -xHost optimization flag, C version and Fortran version seems should not have 100x difference, LOL.
Or, is the 100x difference is mainly caused by Python?

In your opinion, in terms of speed, the new modernized ODEPACK .vs. current F77 version of ODEPACK, how much speedup can the modernized ODEPACK expect to get?

I used LSODA in the F77 ODEPACK for small system like 3 ODE system, its speed is comparable (about 30% slower than FLINT or so, seems not to bad) than some modern solver like FLINT by @prince_mahajan , GitHub - princemahajan/FLINT: Fortran Library for numerical INTegration of differential equations
If the modern version ODEPACK can be significantly faster than F77 version it would be great!

The Fortran, C and C++ versions of lsoda are all almost identical in speed. The modernized version of ODEPACK will not change the speed at all.

I’m much in favor to provide language bindings for libraries we maintain under Fortran-lang. If done right it is not even noticeable that Fortran is used in Python script (apart from the speed, of course). I wouldn’t call the Python package fortranlang however.

For minpack there is an open issue for discussing Python bindings

Usually I put Python bindings in a subdirectory python/ in my projects setup in a way that they can either be built directly with the main project or as a standalone against an existing installation.


My main issue is with ctypes, the handling is quite low-level and requires a lot of repetition, while you don’t get any useful error checking on the Python side. I usually go for wrapper generators like CFFI or SWIG or something which has more direct access to C like Cython.

1 Like

The reason why I have used ctypes is because it is numba compatible. All Python code can be compiled and be very speedy. This sets it apart from Scipy. Cython is not numba compatible. CFFI is numba compatible, but I’ve never used it. Look very similar to ctypes? What is its advantage over ctypes?

CFFI has multiple modes, the ABI modes are similar to ctypes, but a bit more powerful because you have typedefs available while you only get c_void_p for all opaque handles in ctypes. Also, the dlopen step in ctypes is pretty fragile, because you must know where the library is you want to load and the Python module might not be installed in the same prefix as the library wrapped.

I’m mostly using the out-of-line API mode in CFFI, which allows to generate a fully functional extension module directly from a C header, meaning you don’t have to duplicate the header definition in the Python source code. With some extra effort you can even link the library you depend on statically into the extension module and are almost dependency free (useful for producing wheels).

1 Like

Just added a python wrapper to @jacobwilliams dop853 ODE integrator, to numbalsoda.

This Python wrapper beats Julia’s best explicit ODE solvers on my test problem (benchmarks here).

Have you measured the actual error of the different methods? When benchmarking different solvers, it’s generally a mistake to assume that specifying the same rtol and atol lead to comparable error.

I have not done proper “work precision diagrams”. So ya, its possible that Julia is actually performing better.

Thanks for the benchmark though. I think there might be actual performance to gain. Almost half the time spent is spent on saving the output which is too high.

Ya i think performance gains could be made to the contd8 interpolation routine in dop853. contd8 considers a single variable, but it could consider the interpolation of all variables at once. I think this would permit the compiler to make some additional optimizations.

FYI: See here: GitHub - princemahajan/FLINT: Fortran Library for numerical INTegration of differential equations

For some other benchmarks that were done with this.

1 Like

These benchmarks make it look like FLINT outperforms @jacobwilliams version of dop853? This has not been my experience (see here). Although I haven’t done that much testing.

What makes FLINT faster than Julia’s differential equations? I thought they have some symbolic code compile time generation that makes it faster.

Most of the especially fancy stuff is for large or stiff systems. When you have small nonstiff equations there’s not much you can do to speed up (even python’s Jax can compete for stuff like this). It’s also not really correct that FLINT is in general faster here. It is faster in the case with the specific type of callback used because DifferentialEquations.jl currently doesn’t yet have a way to specify a callback that you only want to hit once. For the one without callbacks, the algorithm Julia selects by default for this case (Vern8) is as fast any of the FLINT solvers.

2 Likes