Numeta: A JIT Compiler for Python that Generates Fortran Code

Hey everyone!

I’ve been having a lot of fun working on a side project called Numeta. It’s a super simple Just-In-Time (JIT) compiler for Python that focuses on metaprogramming. Think of it as a dumb and lightweight version of Numba, but instead of trying to do all the fancy bytecode stuff, I’m keeping it simple by trying not to read the AST and generating good old Fortran code (because real programmers write Fortran in any language, no, actually because its very easy to translate Numpy code to Fortran ).

The idea is to use Python’s type hints to tell apart what’s compiled and what’s done at runtime. So far, it’s been really fun seeing this thing take shape! It’s in a very alpha stage (currently depends on gcc and gfortran, but there’s no inherent blocker for using other compilers, provided they support iso_c_binding). The code generation and JITting works, and I’m currently trying to figure out the next steps for the project.

Right now, I’m considering some possible directions:

  1. General improvements (e.g., adding support for return types is a must).
  2. Expand support for more NumPy functionalities. This might require linking against a LAPACK library, which introduces challenges in managing LAPACK as a dependency.
  3. Make it possible to run the code without the decorator (as in Numba), mainly to facilitate debugging.
  4. Expose more of Fortran’s functionalities directly, though this could complicate the previous point.
  5. Change the backend to translate directly to a compiler intermediate representation (preferably at a higher level). While this would involve significant work, it could remove the need to generate source code, making the JIT process more efficient and seamless.

However, here’s a quick taste of what it does:

import numeta as nm

@nm.jit
def mixed_loops(n, array: nm.f8[:, :]) -> None:
    for i in range(n):
        for j in nm.frange(n):
            array[j, i] = i + j

This generates this Fortran code (n=3) that after is compiled and executed:

subroutine mixed_loops(fc_n1, array) bind(C)
    use iso_c_binding, only: c_int64_t
    use iso_c_binding, only: c_size_t
    use iso_c_binding, only: c_double
    implicit none
    integer(c_size_t), dimension(0:1), intent(in) :: fc_n1
    real(c_double), dimension(0:(fc_n1(1)+(-1_c_int64_t)), 0:(fc_n1(0)+(-1_c_int64_t))), intent(inout) :: array
    integer(c_int64_t) :: fc_i1
    integer(c_int64_t) :: fc_i2
    integer(c_int64_t) :: fc_i3
    do fc_i1 = 0_c_int64_t, 2_c_int64_t
        array(0, fc_i1) = (0_c_int64_t + fc_i1)
    end do
    do fc_i2 = 0_c_int64_t, 2_c_int64_t
        array(1, fc_i2) = (1_c_int64_t + fc_i2)
    end do
    do fc_i3 = 0_c_int64_t, 2_c_int64_t
        array(2, fc_i3) = (2_c_int64_t + fc_i3)
    end do
end subroutine mixed_loops

If anyone’s curious or has suggestions (especially if you know a reasonable way to ship or link LAPACK dependencies, or have insights on the directions I’ve mentioned), I’d love to hear your thoughts. I’ve just started adding more features and would appreciate any feedback.

Feel free to check it out here: GitLab Repo

11 Likes

Welcome to the forum, and thanks for informing us of your interesting project. Are you aware of pyccel, which “accelerate[s] Python functions by converting them to Fortran or C functions.”?

The developers wrote about it in Journal of Open Source Software: Pyccel: a Python-to-X transpiler for scientific high-performance computing, claiming orders-of-magnitude speedups.

2 Likes

Thanks! I wasn’t aware of it, but it sounds cool. I’ll check it out and try to replicate their benchmarks when I can. I did a quick comparison of Numeta vs. Numba (just because I had it installed) using matrix multiplication to have an idea:

import numpy as np
import numeta as nm
import numba as nb
import time

n = 500
iterations = 1000 

a = np.random.rand(n, n)
b = np.random.rand(n, n)
c_nm = np.zeros_like(a)
c_np = np.zeros_like(a)

@nm.jit
def matmul(a: nm.f8[:, :], b: nm.f8[:, :], c: nm.f8[:, :]):
    c[:] = 0.0
    for i in nm.frange(a.shape[0]):
        for k in nm.frange(a.shape[1]):
            c[i, :] += a[i, k] * b[k, :]

start = time.time()
matmul(a, b, c_nm)
numeta_compile_time = time.time() - start
print('Numeta compilation and first execution time:', numeta_compile_time)

start = time.time()
for _ in range(iterations):
    matmul(a, b, c_nm)
numeta_exec_time = (time.time() - start) / iterations
print(f'Numeta execution time for {iterations} iterations:', numeta_exec_time)

@nb.jit
def matmul_np(a, b, c):
    c[:] = 0.0
    for i in range(a.shape[0]):
        for k in range(a.shape[1]):
            #c[i, :] += a[i, k] * b[k, :]
            for j in range(b.shape[1]):
                c[i, j] += a[i, k] * b[k, j]

start = time.time()
matmul_np(a, b, c_np)
numba_compile_time = time.time() - start
print('Numba compilation and first execution time:', numba_compile_time)

start = time.time()
for _ in range(iterations):
    matmul_np(a, b, c_np)
numba_exec_time = (time.time() - start) / iterations
print(f'Numba execution time for {iterations} iterations:', numba_exec_time)

np.testing.assert_allclose(c_nm, c_np)
print("Results match between Numeta and Numba implementations.")

And i got very competitive results (I had to remove the array operation in the Numba implementation because it was degrading the performance too much):

Numeta compilation and first execution time: 0.15936017036437988
Numeta execution time for 1000 iterations: 0.015630727052688597
Numba compilation and first execution time: 0.33567142486572266
Numba execution time for 1000 iterations: 0.019384422063827515
Results match between Numeta and Numba implementations.

(Even though I think it is more like a gcc vs llvm comparison)

2 Likes

Welcome @andrea, you may want to check our modern Fortran implementation of LAPACK in the Fortran Standard Library, its packed into few source files and you can easily integrate it with either fpm or CMake

3 Likes

Sounds very cool. Maybe I could add fpm as a dependency, and if there’s a way to access the interface from each package, I could make them callable from Python (though it wouldn’t be easy, especially since I’d need to add support for derived types). I’m still quite new to this—I’m just a chemist, not a software engineer, and I’m not very familiar with fpm, so I might be off on some details. Do you know if each package includes an interface file that I could parse to create the Python wrapper?

1 Like

There is a fairly complete list of Python compilers at the bottom of https://lpython.org/. I need to add yours in. :slight_smile:

1 Like