Location for a new generalized lagtm routine within stdlib

Hi everyone,

I’m planning to work on a generalized lagtm subroutine within stdlib that @hkvzjal proposed. This subroutine will be used by the special matrices functionality. The existing lagtm inside BLAS/LAPACK only supports alpha and beta in the set {-1, 0, 1}. So the goal of this subroutine is to allow the user to perform the matrix operation alpha * A * x + beta * y for arbitrary values of alpha and beta. After discussing briefly with @hkvzjal, we thought it would be best to get wider input from this community before deciding where this new routine should live within the library structure. We thought it would be better to ask in Discourse whether to put it inside the BLAS/LAPACK directory (alongside the reference implementations) or outside it (within the special matrices module). Since we are currently maintaining a modernized version of BLAS/LAPACK 3.10, one idea is to leave BLAS/LAPACK as it is and put the generalized subroutine outside BLAS/LAPACK. If neither of these seems ideal, I’d also appreciate suggestions for any alternative locations within stdlib.

Looking forward to hearing everyone’s thoughts.

Thanks!

I dont use stdlib so maybe my opinion should be discredited a bit, but to me it should be outside of blas. when i think of blas i think that those routines should be guaranteed better than anything i could code myself since it is what is supplied by cpu manufacturers (or really good handmade code with some assembly in the openblas case). someone writing a code themself and putting blas in the name without it actually being blas would make me lose trust in the person/library

Some context here. The routine @Sinan is mentioning is computing the matrix-matrix product C = a*AB + b*C where A is a tridiagonal matrix. For some historical reasons, the lapack version only handles a = {-1, 0, 1} and likewise for b. The code @Sinan is suggesting to add would thus essentially be the same as the original lapack with the exception that a and b could be arbitrary real (or complex) numbers.

I do agree with @Meromorphic however that it wouldn’t be “official” blas/lapack code and adding it explicitly to the appropriate stdlib_lapack_* could be potentially misleading. One possibility though is to create a new module or submodule stdlib_extended_lapack with a clear header indicating that the implementations inside this module are not official lapack but simply extensions of already existing ones to accomodate for some of their deficiencies. For info, Intel has such a module here for generalized blas operations. I’m pretty sure OpenBlas also has something similar since they ship axpby which is not blas-standard.

1 Like

Thank you @Meromorphic and @loiseaujc for the inputs.

I completely agree that we should avoid putting anything under lapack or blas if it’s not part of the official modules. This can confuse users about what is standardized and what is an extension.

The idea of introducing extended lapack and blas seems like a clean compromise. It leaves the official modules untouched whiling providing us room to fill the known gaps. If others would agree with this, I can proceed with the implementation.

Happy to hear more thoughts.

2 Likes

I think this is an interesting discussion. I’ve been using BLAS routines since they were published in 1979. Those routines were used extensively in LINPACK, so I followed that as a guide. Maybe I can give some historical perspective.

This was definitely NOT the situation in the 1970s and 1980s. Through that period, many vendors had their own vector and matrix library routines that would outperform the BLAS codes, especially for the later level-2 (published in 1988) and level-3 routines (published in 1990). Cray for example had a very nice generalized matrix-matrix product routine called MXMA() that allowed arbitrary strides in all its indices and arbitrary transpose of any of the three matrix arguments. It was not only more flexible than the later BLAS DGEMM(), but it outperformed it for a long time. Cray finally got around to optimizing the level-3 BLAS on their hardware in the early 1990s, but if someone needed the extra flexibility provided by MXMA then they would continue to use it. Of course, there were performance problems when running on nonCray hardware where you had to fall back to a fortran version. This was a mess to try to do efficiently, and the programmer had to choose between performance and portability. Other scientific libraries at that time were distributed or marketed by IBM, Boeing, NAG, and others that all put similar portability vs. performance decisions on the programmer.

Nowadays, intel MKL is in that same category, but when it started in the mid 1990s the BLAS had established itself, so those routines were included from the beginning.

I don’t see anything wrong with putting extra routines into a library, provided there are no name conflicts with the standard BLAS routines. Of course, such routines should be identified clearly as extensions. Sometimes when sufficiently useful these extensions themselves become standard, and that is a natural evolution process for software. Of course, open source code for such extensions helps to address the portability aspect, but it also places more demands on the programmer.

Another common problem is to introduce a useful extension to a library, and then the standard BLAS/LAPACK introduces an extension to their library with the same name but with different arguments, or maybe even different functionality entirely. This happened to me when I needed a version of the DLARFG() routine with a positive beta. I wrote my own routine and used it for many years. Then about 10 years ago it was added to the standard BLAS/LAPACK library, but with a different name, DLARFGP(). I did not use that name originally because it was 7 characters, and at the time I was staying within the 6-character limit for fortran.

3 Likes

Thanks @RonShepard for your inputs. I agree with your point that there is nothing inherently wrong with putting non-standard routines inside a library as long as they don’t cause any naming conflicts. That aligns with the suggestion from @loiseaujc about creating a dedicated place for such routines. For the generalized lagtm, the plan then would be to put it outside lapack - either as a new module or under a new folder (for example something like stdlib_extended_lapack, name is still open for discussion), where we can provide functionality that addresses known limitations of the official routines but remains clearly separate. This feedback helps a lot , thanks again for sharing the historical context and practical considerations.