RFC: Get rid of linalg.interpolative Fortran code

zaikunzhang · April 27, 2023, 8:59am

It may be a good idea to look at the following recent issue raised on the GitHub repo of SciPy, particularly its comments on Fortran and f2py.

github.com/scipy/scipy

RFC: Get rid of linalg.interpolative Fortran code

opened 08:06PM - 25 Apr 23 UTC

ilayn

scipy.linalg RFC

I have been working on the linalg/src directory for a while now. The `id` librar…y which is written in Fortran and in a de-facto unmaintained status. While I am grateful for the authors' efforts back in the day, we were a bit too liberal in including Fortran code and now we found ourselves in a strange predicament. The code doesn't do much but deals with LinearOperator and sparse arrays. The `id` code is very unorthodox by today's standards and I don't have too much of hope to modernize it through Fortran tweaks. Plus it's randomization machinery is soon going to be noncompliant with NumPy (I just learned about these things very recently so allow me to pretend knowledge here). I can go about my Fortran diatribe for a long thread but TL;DR We have to get Fortran parts out of the way and `f2py` is not cutting it anymore. To give a single example if you want to support ndarrays, F-arrays are useless because you can't slice them in the last dimensions. That is to say `(x, y, :, :)` is not contiguous but `(:, :, x, y)` is for F-ndarrays. OK but then we want to use some obscure Fortran library but still keep the C dominated NumPy operations what gives? Welcome to the world of buffer joggling. In fact just because of the wrappers we tell the users [to start with an F array](https://docs.scipy.org/doc/scipy/reference/linalg.interpolative.html#initializing). I am not writing this to be controversial but we have to get rid of all Fortran libraries and while we are at it all `numpy.matrix` usage at some point. It is not an impossible task but we have to warm up to this idea. After fighting with f2py and all fortran tooling since the lockdown, I can't see any benefit any more other than continuous headache and causing noncompliancy across the board. `f2py` is seriously dear to my heart however it is showing its age and there is no light at the tunnel yet to modernize it and address many performance issues that come with it. Let me emphasize, Fortran is a nice and a very fast language. But that is all there is to say about it. Everything else from our side is just pain to work with it. And hence just to kick start this crusade, I would like to propose that we get rid of this codebase. In fact maybe this code should be moved to `scipy.sparse.linalg` completely.

jacobwilliams · April 27, 2023, 11:55am

I think we have to come to terms with the fact that fortran tooling will not get any better anytime soon.

jacobwilliams · April 27, 2023, 12:19pm

The bottom line is that the Python people need a lot of help from the modern Fortran people. Their ecosystem is being poisoned by all this unmaintained fixed-form FORTRAN code. F2py is kind of an abomination that enables this unhealthy relationship. Can we show them a better way?

gnikit · April 27, 2023, 12:22pm

@certik could LFortran be used to convert the Fixed Form Fortran to Native Python, or potentially Free Form? (No idea what can of worms this might open with regards to performance).

zaikunzhang · April 27, 2023, 12:28pm

Out of curiosity, if the conversion is possible, what would happen to the GOTO? Would it be converted to something that is essentially still GOTO? If that is the case, how much would the Python or free-form Fortran code be better than the original fixed-form Fortran code, assuming the latter has hundreds of GOTO’s?

hkvzjal · April 27, 2023, 12:50pm

It would seem, just like @jacobwilliams said, that the automagical tooling is not necessarily the biggest issue, this is kind of what f2py tried to put forward. Such that one could kind of forget about fortran by letting the tool deal with it.

By reading the post it seems like they are not confortable playing around with Fortran because they might not be using the tools put forward by the Modern Fortran community.

After building a Python API out of pure fortran I was actually frustrated by f2py and went for iso_c_binding + ctypes, the learning curve might be steeper at first but one has much more freedom down the road.

Beliavsky · April 27, 2023, 12:58pm

Even if the FORTRAN code were modernized, would that address this concern?

To give a single example if you want to support ndarrays, F-arrays are useless because you can’t slice them in the last dimensions. That is to say (x, y, :, :) is not contiguous but (:, :, x, y) is for F-ndarrays. OK but then we want to use some obscure Fortran library but still keep the C dominated NumPy operations what gives? Welcome to the world of buffer joggling. In fact just because of the wrappers we tell the users to start with an F array.

jacobwilliams · April 27, 2023, 3:19pm

I didn’t understand this. Does anyone know what they are talking about? Something to do with row-major vs column-major?

jacobwilliams · April 27, 2023, 3:24pm

Right. I don’t use f2py either, I use iso_c_binding + ctypes and have no problems doing anything I need to do. Is this what they should be doing? How does SciPy interact with C code? Do they have a c2py or do they call it directly like they could do with Fortran if they were using iso_c_binding that has been available for 20 years?

certik · April 27, 2023, 4:47pm

Yes, to convert to free-form, just take this fixed-form:

      integerfunctionfunctionf(n)
      integerfunctionfunctionf(n)
      integern
      endfunctionfunctionf

and print it as free-form:

$ lfortran --fixed-form fmt a.f
integer function functionf(n)
integer :: functionfunctionf(n)
integer :: n
end function functionf

To convert to native Python, we are planning to add ASR->Python translation, just like we already have ASR->Julia or ASR->C++.

Right now our priority is to just compile all our codes, including all of SciPy. Doing these code transformations is not difficult at all and we’ll make it nice and smooth.

vmagnin · April 27, 2023, 5:16pm

Impressive! Does other compilers have that kind of feature? (I’ve never heard of)

hkvzjal · April 27, 2023, 6:46pm

I guess so, I think his point is that he wants to keep all of his interfaces C-style row-major.

From my experience I would definitely advise going in this direction. I don’t know how they interact at the source code with the FORTRAN libraries. But if one would like to have arrays being row-major on the python side of things, yet enable interacting with Fortran in a column-major approach it is possible:

let’s say I have a module that I compile to a dll in a file fortran2python.f90

Summary

module wrapper
use iso_c_binding
implicit none
contains
    subroutine say_hello(X,dim,num_points)
    !DEC$ ATTRIBUTES STDCALL,REFERENCE,ALIAS:'say_hello', DLLEXPORT :: say_hello
    ! Variables
    integer(c_int), intent(in):: num_points
    integer(c_int), intent(in):: dim 
    real(c_double), intent(inout) :: X(dim,num_points) !! array of coordinates
    
    integer :: i
    !-------------------------------
    print *, 'hello from fortran'
    
    do i = 1, num_points
        print *, 'by data point', x(:,i)
    end do
    
    x(1,2) = 2*x(1,2)
    
end subroutine say_hello

end module

and a python test.py

Summary

import numpy as np
import ctypes

folderpath = 'D:\your\full\folder\path'
DLLname = 'fortran2python.dll'
api = np.ctypeslib.load_library(DLLname,folderpath)

X = np.array([
    [1.0, 2.0],
    [3.0, 4.0],
    [5.0, 6.0],
])

print('X shape:',X.shape)
print(X[0,0],X[0,1])
print(X[1,0],X[1,1])
print(X[2,0],X[2,1])

api.say_hello(  ctypes.byref(np.ctypeslib.as_ctypes( X )) ,
                ctypes.byref(ctypes.c_int(X.shape[1])) ,
                ctypes.byref(ctypes.c_int(X.shape[0])) 
             )

print('back in python')
print(X[0,0],X[0,1])
print(X[1,0],X[1,1])
print(X[2,0],X[2,1])

This will print

X shape: (3, 2)
1.0 2.0
3.0 4.0
5.0 6.0
 hello from fortran
 by data point   1.00000000000000        2.00000000000000
 by data point   3.00000000000000        4.00000000000000
 by data point   5.00000000000000        6.00000000000000
back in python
1.0 2.0
6.0 4.0
5.0 6.0

So, you can see that on the python side of things you handle the data in a row-major manner, on the Fortran side on a colum-major manner. No need to copy the data or Transpose the array.

It would be instructive to see an actual use-case of their dilemma, but I’m pretty convinced it could be handled.

certik · April 27, 2023, 6:59pm

They also seem to use Cython, so they could use iso_c_binding to make Fortran just behave like any other C function and then they can wrap it like they wrap C and C++ libraries.

There might however be a constrain on Windows in terms of what Fortran compilers they can use and what capabilities they have.

jacobwilliams · April 27, 2023, 7:09pm

Yes, that is likely. They just need some help, it’s an opensource project. These people are typically not Fortran experts and all they have seen is terrible FORTRAN 77 which has poisoned their minds.

I happen to think if you are going to use a compiled language for math, the best language is Fortran. If that’s not true, then we need to fix it. (we meaning the user community and also the Fortran standard committee). Being fast isn’t enough anymore, we need modern tooling, a modern library ecosystem, an engaged community, etc. (these things are happening, but slowly). It may be too late, but I don’t know.

certik · April 27, 2023, 8:35pm

I 100% agree!

It is definitely late for some projects and corporations / labs. But it is not too late for others, and there are many which are borderline, they want to move away today, but if we fix this, they won’t move tomorrow. So we have to hurry!

Also, let me know if you agree with my plan, addressing all the SciPy issues: RFC: Get rid of linalg.interpolative Fortran code · Issue #18367 · scipy/scipy · GitHub

FortranFan · April 28, 2023, 12:57pm

Things need to improve drastically and fast with the functioning of the Fortran standard committee, barring a few exceptions what comes out is overall too little, too late for the practitioners and there is no vision for the language.

Topic		Replies	Views
Numpy f2py tool Help	9	945	May 30, 2020
About f2py, IPC and Wrappers	13	1767	September 15, 2021
The Fortran stdlib project has garnered over 1000 stars on GitHub!	6	641	July 2, 2024
SciPy: "there is interest in adopting this (PRIMA) implementation, but little appetite for taking on more Fortran code"	28	11653	February 2, 2024
Bridging Fortran and Python: A Practical Mini-Course and Resources Tutorials	1	362	November 29, 2024

RFC: Get rid of linalg.interpolative Fortran code

Related topics