Keep linear algebra in fortran

certik · May 14, 2021, 4:16pm

This is matrix matrix multiplication? That is bound by the multiplication cost. For large enough matrices, you can get 100% of the theoretical performance peak in Fortran, I have done that about 5 years ago and measured it. I believe the same speed as OpenBLAS. For small matrices OpenBLAS is faster, because then you have to hide the latency of memory read/write and it gets complicated. However, if you have many small matrices to multiply, you can hide this cost. I have done that for matrix-vector multiply, if you have many vectors to multiply with the same matrix, you can vectorize efficiently and get very close to the theoretical peak performance, in Fortran. But if you have just one matrix and one vector to multiply, it is a very complicated assembly code that you have to write to carefully balance latency of reads and multiplies/additions, you can look into OpenBLAS, that’s not easy. You can’t do it from Fortran or C, unfortunately.

Can you post a C code that is faster? It’s the same issue there, I don’t think there is any advantage there. Typically one has to go into assembly to hide the latency cost if that is the issue. I have done that, I can show how that is done if there is interest.

Topic		Replies	Views
Unbeatable micro benchmark Help	11	949	August 13, 2020
Does LAPACK/BLAS automatically use multi cores or threads?	35	6415	August 3, 2022
Mapping matrix & vector arithmetic to BLAS calls	8	1390	July 20, 2022
Testing the performance of `matmul` under default compiler settings Help	37	2843	August 11, 2022
Julia: Fast as Fortran, Beautiful as Python	184	12025	November 13, 2022

Keep linear algebra in fortran

Related topics