Thanks for the question @CYehLu and welcome to the forum! Yes, your question is on topic here, I am glad you asked. The f2py approach I think should be faster. Another fast approach is using Cython or the C/API directly, see here how to do it:
As long as you do not copy the arrays and access their data directly, the performance overhead is in converting the array descriptor and interface. Creating a shared library (f2py or Cython or C/API approaches) is faster, as you just import the module in Python. The ctypes approach has to execute the Python code first to create the Python interface (but it has the advantage that you don’t have to compile anything).