Writing wrappers for LAPACK and BLAS routines

ivanpribec · November 25, 2023, 7:15pm

It varies, depending upon structure and hardware you are trying to target. A vague definition would be “enough that it is worth exploiting”. In cuSPARSE and MKL there are block sparse formats, where some zeros may be stored, but this way there are nice “dense” regions, where you can do dense computation. When matrices are symmetric, A = A^T, or the algorithms involve calculation of matrix powers y = A^p x (matrix polynomials, s-step Krylov methods), there are more tricks which can be exploited.

I’ve seen some works from the libCEED and MFEM communities, where the matrix-free approaches beat assembled matrices by large margins. See for example Fig. 7 in this work: [2204.01722] Performance Portable Solid Mechanics via Matrix-Free $p$-Multigrid, or other works by Jed Brown. Another person working heavily in this area, specifically on matrix-free discontinous Galerkin methods, is Martin Kronbichler.

AFAIK, the advantage of these algorithms is because they have higher amounts of flops per dof, they are better suited to today’s “imbalanced” architectures, where you can do more flops than memory accesses.

(Image taken from the NHR PerfLab Seminar: A Review of Processor Advances Over the Past Thirty Years)

Topic		Replies	Views
Wrappers around DGEMM and DGEMV	6	526	January 8, 2024
Compiling simple Fortran program using fpm and local lapack and blas fpm	6	1605	November 19, 2022
Pure wrappers for impure procedures	13	1364	April 15, 2022
LAPACK interfaces	45	3221	August 21, 2023
Linear Algebra API: call for feedback Announcements	13	945	January 6, 2025

Writing wrappers for LAPACK and BLAS routines

Related topics