I would like to announce fastGPT
, a fast GPT-2 inference written in Fortran:
- Code: GitHub - certik/fastGPT: Fast GPT-2 inference written in Fortran
- Blog post: fastGPT: Faster than PyTorch in 300 lines of Fortran
- Twitter: https://twitter.com/OndrejCertik/status/1635768419307110400
- Hacker News: FastGPT: Faster than PyTorch in 300 lines of Fortran | Hacker News
I recommend to read the blog post above for background and motivation. See the README at GitHub for an example and benchmarks.
It’s pure Fortran, it’s short, readable and most imporantly: fast. On my Apple M1 it looks like it is faster than PyTorch in fair comparison, and a lot faster if I use optimizations/backends that PyTorch doesn’t use. It also starts immediately. It is a standalone Fortran application, currently we still need Python to encode the input string to tokens, but then fastGPT
takes it, generates more tokens and converts them back to text.
It is written like any other numerical computational code. I think Fortran is the perfect fit, at least for GPT-2 inference, but probably for other similar ML/AI models too.
fastGPT
is currently only parallelized via parallel OpenBLAS. We have a great single core CPU performance, and this provides a solid foundation for parallelization and GPU offloading. I am hoping some of you would be interested in helping. We can try MPI, and @rouson can try coarrays. I recommend to approach it like any other physics or numerical code, and let’s see how fast we can make it in parallel. This would also be a great GSoC project, both parallelization and making the application more user friendly (such as porting the encoder into Fortran so that we don’t need Python, see the issue tracker for more ideas).