FPGA Hardware acceleration

Greetings,

I have an idea to provide FPGA based hardware acceleration for the excellent LAPACK library. For those unfamiliar with what an FPGA is, its basically an electrical circuit that represents an algorithm. Any purely mathematical function in a language such as C or FORTRAN can be represented as an electrical circuit. I have access to a tool that can convert C/C++ into VHDL which in turn is used to generate a circuit design that can be called by C.

My idea is to convert portions of LAPACK into C/C++ strictly to run the tool that will in turn convert it into VHDL, VHDL is then used to program the FPGA, then call this from within fortran as a C extension library.

I am not a fortran programmer, I only know about LAPACK because it is the heart of Python numpy. My goal would be to create a custom numpy in which LAPACK calls hardware accelerated routines, this will result in numpy and every python application that would use numpy to have hardware acceleration. However, by doing this anyone who uses fortran LAPACK can leverage the same technology.

It would a quite complex build/dev ops endeavor but quite worth it.

python → numpy → C → fortran lapack → C wrapper for talking to the FPGA board → physical gates on a chip representing code

Before I begin diving into this does anyone know of existing FPGA based hardware acceleration for FORTRAN?

If not I could be a trail blazer …

Does anyone have experience with LAPACK that could point to some good often used primitives that could provide the biggest bang for the buck to be accelerated?

2 Likes

@cloudslicer , welcome to the forum, yours might very well be a trailblazing effort.

I plead complete ignorance re: your effort otherwise, but do you have a canonical case involving Python and Numpy using LAPACK upon which you can base your initial evaluation of hardware acceleration benefits?

By the way, you may know of the following where PLASMA is in C and includes the headers for Python:
https://dl.acm.org/doi/fullHtml/10.1145/3264491

The research done by this company appears to be related. PUBLICATIONS – Accelogic, LLC >>>
Here’s a paper that looks pretty related as well: Comparison of High Level FPGA Hardware Design for Solving Tri-diagonal Linear Systems - ScienceDirect
They don’t mention Numpy so you could have a great project idea there.
I also found this project that maybe you could use?: https://icl.utk.edu/magma
I think a goal of LFortran would be to have a backend that would get you a language you could use. https://lfortran.org

@cloudslicer welcome to the forum!

Indeed, we can have a backend in LFortran that can translate LFortran’s ASR into VHDL or Verilog. LFortran also has a C++ translation backend that you can use.

The best way forward to play with this is to start with a simple example. I am happy to get you started with LFortran if you are interested in pursuing that approach.

The high-level IR of Flang (Fortran IR) is an MLIR based dialect. There exist MLIR based dialects in the circt project (https://circt.llvm.org/) which can represent hardware. We can theoretically write a conversion from Fortran IR to one of the MLIR dialects that represent hardware. I am interested in such a project.

I believe that Dr. Lenore Mullin has been working on FPGA acceleration of array operations using her “Mathematics of Arrays” formal techniques. Much of her work has been directed at Fortran, but I don’t recall if the FPGA work was.

You might find much of use in her work.