 # Interact with python -- f2py or ctypes?

I just discovered here recently, very glad that there is a community to discuss about Fortran.

Most of time I use python for visualization or some simple calculation. Of course I sometimes encounter performance problems and I just wonder if I can call Fortran subroutines or functions in python to solve this problem.

I have some experience in using `f2py` or `ctypes` to call Fortran subroutines/functions in python. But I’m not sure which one is faster.
I’ve written some codes and compared them, I found that the performances are similar but `f2py` is slightly better. But this is only one case, I’m not sure that if using `f2py` is always better than using `ctypes`.

Does anyone has some experience for it? Or maybe knowing the other tool which is better than `f2py` and `ctypes`.
I decided to use `f2py` and `ctypes` just because I didn’t need to install anything else.

p.s.
If needed, here is the code what I compare between `f2py` and `ctypes`. They calculate the vertical component of vorticity.
BTW, pure `numpy` solution is the fastest in my test… I am not sure if the cost in calling external library slowdown the performance of calling fortran subroutine?

p.s.2
I’m not sure if this topic is suitable in here. If not I’ll delete this post as soon as possible.

``````! f2py -c solve_f2py.f90 -m solvef90f2py

subroutine vorticity_f90loop(nx, ny, u, v, dx, dy, vort)
implicit none
integer, intent(in) :: nx, ny
real*8, intent(in) :: u(ny,nx), v(ny,nx)
real*8, intent(in) :: dx, dy
real*8, intent(out) :: vort(ny,nx)

real*8 :: dx2, dy2
integer :: i, j
! ---------------------------------------------

dx2 = 2. * dx
dy2 = 2. * dy
vort(:,:) = 0.

do i = 2, nx-1
do j = 2, ny-1
vort(j,i) = (v(j,i+1)-v(j,i-1)) / dx2 - (u(j+1,i)-u(j-1,i)) / dy2
end do
end do

return
end subroutine vorticity_f90loop
``````
``````! gfortran -shared -fPIC solve_ctypes.f90 -o solvef90ctypes.so

subroutine vorticity_f90loop(nx, ny, u, v, dx, dy, vort) bind(c, name='vorticity_f90loop')
use iso_c_binding, only : c_double, c_int
implicit none

integer(c_int), intent(in) :: nx, ny
real(c_double), intent(in) :: u(ny,nx), v(ny,nx)
real(c_double), intent(in) :: dx, dy
real(c_double) :: vort(ny,nx)

real(c_double) :: dx2, dy2
integer(c_int) :: i, j
! ---------------------------------------------

dx2 = 2. * dx
dy2 = 2. * dy
vort(:,:) = 0.

do i = 2, nx-1
do j = 2, ny-1
vort(j,i) = (v(j,i+1)-v(j,i-1)) / dx2 - (u(j+1,i)-u(j-1,i)) / dy2
end do
end do

return
end subroutine vorticity_f90loop
``````
``````# solve.py

import time
import ctypes as ct
import numpy as np
import solvef90f2py as f90f2py

def vorticity_py(u, v, dx, dy):
ny = len(u)
nx = len(u)
vort =[[0 for _ in range(nx)] for _ in range(ny)]

dx2 = dx * 2
dy2 = dy * 2

for j in range(1, ny-1):
for i in range(1, nx-1):
vort[j][i] = (v[j][i+1] - v[j][i-1]) / dx2 - (u[j+1][i] - u[j-1][i]) / dy2

return vort

def vorticity_np(u, v, dx, dy):
ny, nx = u.shape
vort = np.zeros((ny, nx))
vort[1:-1,1:-1] = (v[1:-1,2:] - v[1:-1,:-2]) / (2*dx) - (u[2:,1:-1] - u[:-2,1:-1]) / (2*dy)
return vort

def solve_f90ctypes_loop(nx, ny, u, v, dx, dy):
f90lib = ct.CDLL('solvef90ctypes.so')
f90lib.vorticity_f90loop.argtypes = [
ct.POINTER(ct.c_int),
ct.POINTER(ct.c_int),
np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, shape=(ny, nx)),
np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, shape=(ny, nx)),
ct.POINTER(ct.c_double),
ct.POINTER(ct.c_double),
np.ctypeslib.ndpointer(dtype=np.float64, ndim=2, shape=(ny, nx))
]
f90lib.vorticity_f90loop.restype = None

u = np.asfortranarray(u)
v = np.asfortranarray(v)
vort = np.empty_like(u, dtype=np.float64, order='F')

f90lib.vorticity_f90loop(
ct.byref(ct.c_int(nx)),
ct.byref(ct.c_int(ny)),
u,
v,
ct.byref(ct.c_double(dx)),
ct.byref(ct.c_double(dy)),
vort
)
return vort

if __name__ == '__main__':
# prepare data
dx = 0.5
dy = 0.5
ny = 1000
nx = 1000

u = np.random.rand(ny, nx)
v = np.random.rand(ny, nx)
u_lst = u.tolist()
v_lst = v.tolist()

# pure python
t1 = time.time()
vort_py = vorticity_py(u_lst, v_lst, dx, dy)
t2 = time.time()
print(f'(python)          time = {t2-t1:.8f}')

# numpy
t1 = time.time()
vort_np = vorticity_np(u, v, dx, dy)
t2 = time.time()
print(f'(numpy)           time = {t2-t1:.8f}')

# f2py - loop
t1 = time.time()
vort_f90loop = f90f2py.vorticity_f90loop(u, v, dx, dy)
t2 = time.time()
print(f'(f90+f2py, loop)  time = {t2-t1:.8f}')

# ctypes - loop
t1 = time.time()
vort_ctloop = solve_f90ctypes_loop(nx, ny, u, v, dx, dy)
t2 = time.time()
print(f'(f90+ct, loop)    time = {t2-t1:.8f}')

# check result
print()
print(vort_py[:5])
print(vort_np[2,:5])
print(vort_f90loop[2,:5])
print(vort_ctloop[2,:5])
``````

Execute result:

``````\$ python solve.py
(python)          time = 0.32275057
(numpy)           time = 0.01257682
(f90+f2py, loop)  time = 0.01558447
(f90+ct, loop)    time = 0.01573086

[0, -0.5746908151001381, -0.4502448148151549, 0.4887173001423759, 0.17991335715166268]
[ 0.         -0.57469082 -0.45024481  0.4887173   0.17991336]
[ 0.         -0.57469082 -0.45024481  0.4887173   0.17991336]
[ 0.         -0.57469082 -0.45024481  0.4887173   0.17991336]
``````
2 Likes

Thanks for the question @CYehLu and welcome to the forum! Yes, your question is on topic here, I am glad you asked. The f2py approach I think should be faster. Another fast approach is using Cython or the C/API directly, see here how to do it:

As long as you do not copy the arrays and access their data directly, the performance overhead is in converting the array descriptor and interface. Creating a shared library (f2py or Cython or C/API approaches) is faster, as you just import the module in Python. The `ctypes` approach has to execute the Python code first to create the Python interface (but it has the advantage that you don’t have to compile anything).

3 Likes

I have no experience in using C/API to call C function in python. I found this example, is it the method to interact with fortran subroutines/functions through C/API?

Also I’m not fully understand the explanation you mentioned about. You mentioned that “`The ctypes approach has to execute the Python code first to create the Python interface (but it has the advantage that you don’t have to compile anything).`
But I think that compiling fortran code into shared library (`*.so` files) is necessary both for `f2py` and `ctypes`. That is, I still need

``````\$ gfortran -shared -fPIC test.f90 -o test.so
``````

to compile my fortran code before using `ctypes` in python. Is anything I misunderstood?

I think so. I personally prefer to use Cython, that essentially generates such a C/API file for you.

Yes, you still have to call `gfortran` no matter what to compile your Fortran code. But with `ctypes` that is all that you have to compile, while with `Cython` you also have to compile the C file using `gcc` to create an so module that can be imported. Also, if you compile the C module then typically you would compile the Fortran `test.f90` to an object file and just link it with the C wrapper into one shared library that you then `import` from Python.

2 Likes

This thread should answer some of your questions, we started writing up some of the details for the different approaches but there is no final guide yet.

1 Like

Thanks to @certik and @awvwgk !
Very helpful, I think I should continue to use `f2py` as my fortran subroutines/functions wrapper.
Other tools like pyccel seems very interesting and I think I can try to use it.
Thanks again, very glad to come to here.

1 Like

@CYehLu
Did you try numba? I have slightly modified your code ( I assume that using np.arrays instead of lists is allowed in “pure” python version)

``````import numpy as np
from numba import njit

def vorticity_py(u, v, dx, dy):
ny, nx = u.shape
vort = np.zeros((ny, nx))
dx2 = dx * 2
dy2 = dy * 2
for j in range(1, ny-1):
for i in range(1, nx-1):
vort[j,i] = (v[j,i+1] - v[j,i-1]) / dx2 - (u[j+1,i] - u[j-1,i]) / dy2
return vort

def vorticity_np(u, v, dx, dy):
ny, nx = u.shape
vort = np.zeros((ny, nx))
dx2 = dx * 2
dy2 = dy * 2
vort[1:-1,1:-1] = (v[1:-1,2:] - v[1:-1,:-2]) / dx2 - (u[2:,1:-1] - u[:-2,1:-1]) / dy2
return vort

@njit(parallel = True)
def vorticity_jit(u, v, dx, dy):
ny, nx = u.shape
vort = np.zeros((ny, nx))
dx2 = dx * 2
dy2 = dy * 2
for j in range(1, ny-1):
for i in range(1, nx-1):
vort[j,i] = (v[j,i+1] - v[j,i-1]) / dx2 - (u[j+1,i] - u[j-1,i]) / dy2
return vort

# prepare data
dx = 0.5
dy = 0.5
ny = 1000
nx = 1000
u = np.random.rand(ny, nx)
v = np.random.rand(ny, nx)

res = vorticity_py(u,v,dx,dy)
print(res[1,1])
res = vorticity_np(u,v,dx,dy)
print(res[1,1])
res = vorticity_jit(u,v,dx,dy)
print(res[1,1])
``````

Results:
%timeit vorticity_py(u,v,dx,dy)
1.24 s ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit vorticity_np(u,v,dx,dy)
16.5 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit vorticity_jit(u,v,dx,dy)
9.47 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

1 Like

Hi @Dobrodzieju
Yes I have tried `numba` and I sometimes use it to speed up my python code.
But sometimes a function will be executed only once (but cost lot of time), then `numba` is not very suitable because it has to compile when I execute it for the first time.
I just knew that `numba` also offers AOT (ahead-of-time compilation) but I have not tested it.
Also sometimes I really need to speed up more – more than `numba` can offer. So I have to search for some method to call Fortran/C functions in python.

Still appreciate for your `numba` example! BTW, your `numba` version is much faster but I am curious about the performance when `parallel` is turned off?
(I don’t have a computer to test it now…)