LAPACK interfaces

rwmsu · August 16, 2023, 6:16pm

I gave up on MKL LAPACK95 because someone at Intel thought it would be a cool idea to change the signature (ie order of arguments) on some of the subroutine argument lists versus what was in the Netlib distribution. Anyone know why they did that? I’ve found it just as easy to select the 10 or so things I will ever use from LAPACK and just write my own interfaces and wrapper routines that call the standard LAPACK or MKL LAPACK.

jkd2022 · August 16, 2023, 6:27pm

Why do you need an LAPACK interface at all? It may help for spotting errors in the passing of arguments, but beyond that it just seems like redundant code to me.

MarDie · August 16, 2023, 6:50pm

Detecting errors at compile time is really a big help, so I don’t consider it redundant code.

Compare calling LAPACK for computing eigenvalues with the interface of numpy. The LAPACK interface itself is horrible, but on top it forces the programmer to do work like passing dimensions that could be done by the language/compiler.

jkd2022 · August 16, 2023, 7:14pm

How is

call zheevd('V','U',n,a,ld,w,work,lwork,rwork,lrwork,iwork,liwork,info)

horrible or even difficult? It’s flexible, efficient and I personally like to see the ranges and leading dimensions of my arrays explicitly. For example, I can block diagonalize part of the matrix just by pointing to the starting element and changing n. This will not incur copy-in and -out inefficiencies. Dongarra et al. knew what they were doing.

Beliavsky · August 16, 2023, 7:42pm

If you ask ChatGPT-4 how to call zheevd from scipy, it gives the following code, which is “glance-clear”. (I tested it, and it runs.) For the Fortran call there are many arguments, and it’s not clear to the uninitiated which arguments are inputs or outputs, and the fact that argument a is both an input and output will trip up people. The Fortran driver of ChatGPT (also below) is much longer than the Python one.

Python

import numpy as np
from scipy.linalg import lapack

# Create a complex Hermitian matrix
A = np.array([[1+0j, 2+3j, 4+0j],
              [2-3j, 5+0j, 6+1j],
              [4+0j, 6-1j, 8+0j]])

# Compute eigenvalues and eigenvectors using zheevd
eigenvalues, eigenvectors, info = lapack.zheevd(A)

if info:
    print("info:",info)
print("eigenvalues:", eigenvalues)
print("eigenvectors:", eigenvectors)

Fortran

program zheevd_example
  implicit none

  ! Declare necessary variables and constants
  integer, parameter :: n = 3
  integer :: info, lwork, lrwork, liwork
  integer, dimension(5*n) :: iwork
  real(8) :: rwork(1)
  complex(8), dimension(n,n) :: A, Z
  real(8), dimension(n) :: w
  complex(8), dimension(1) :: work(1)

  ! Define a complex Hermitian matrix
  data A / (1.0d0, 0.0d0), (2.0d0, 3.0d0), (4.0d0, 0.0d0), &
           (2.0d0, -3.0d0), (5.0d0, 0.0d0), (6.0d0, 1.0d0), &
           (4.0d0, 0.0d0), (6.0d0, -1.0d0), (8.0d0, 0.0d0) /

  ! Workspace query
  call zheevd('V', 'U', n, A, n, w, work, -1, rwork, -1, iwork, -1, info)
  lwork = int(real(work(1)))
  lrwork = int(rwork(1))
  liwork = iwork(1)

  ! Allocate work arrays
  allocate(work(lwork), rwork(lrwork))

  ! Compute eigenvalues and eigenvectors
  call zheevd('V', 'U', n, A, n, w, work, lwork, rwork, lrwork, iwork, liwork, info)

  ! Check for success
  if (info == 0) then
     print *, "Eigenvalues: ", w
     print *, "Eigenvectors: "
     print *, A
  else
     print *, "Error in zheevd, info = ", info
  end if

  ! Deallocate work arrays
  deallocate(work, rwork)

end program zheevd_example

ETA: Calling LAPACK95 looks like this:

program zheev_lapack95_example
  use f95_lapack, only: zheev
  implicit none

  ! Declare necessary variables
  integer, parameter :: n = 3
  integer :: info
  complex(8), dimension(n,n) :: A, Z
  real(8), dimension(n) :: w

  ! Define a complex Hermitian matrix
  data A / (1.0d0, 0.0d0), (2.0d0, 3.0d0), (4.0d0, 0.0d0), &
           (2.0d0, -3.0d0), (5.0d0, 0.0d0), (6.0d0, 1.0d0), &
           (4.0d0, 0.0d0), (6.0d0, -1.0d0), (8.0d0, 0.0d0) /

  ! Compute eigenvalues and eigenvectors
  call zheev(A, w, Z, info)

  ! Check for success
  if (info == 0) then
     print *, "Eigenvalues: ", w
     print *, "Eigenvectors: "
     print *, Z
  else
     print *, "Error in zheev, info = ", info
  end if

end program zheev_lapack95_example

MarDie · August 16, 2023, 7:44pm

yes, I call it horrible:

too many arguments
non-speaking names

It competes with something as simple as linalg.eigh(a).

In many cases a few percent performance gains that can be achieved by people who take details of a specific computer setup into account and spend time on performance measurements don’t compensate the extra work. I personally rather wait 20 minutes longer for a computation than spending time on coding.

Dongarra et al. knew what they were doing.

yes, they were masters of the tools of their time and the success of LAPACK is impressive. The same holds for Konrad Zuse. Still, I do not want to program a Z3.

MarDie · August 16, 2023, 8:08pm

I do the same. It would be nice if this work could be done once for everyone in a portable manner.

ivanpribec · August 16, 2023, 8:55pm

The conversion from real to integer in the workspace query has always bothered me; there is even a bug related to it. Typically I will just the minimum value prescribed by the function, and ignore the workspace query. Does anyone know what’s the performance gain from an optimal size?

If you need to do a partial operation on a sub-block, and don’t want to make an extra copy, I see no other way than using the original (yet verbose) interface. It depends what type of work you are doing and whether storage savings matter to you.

I don’t see anything wrong with having two interfaces, an “expert” one, and a simplified one. So I don’t think your and @jkd2022’s views exclude each other. I attribute part of the success of LAPACK (and other Fortran packages used in SciPy for that matter) to its interfaces. (This doesn’t mean further improvements can’t be made; requirements change, and so must software.)

jkd2022:

How is
call zheevd('V','U',n,a,ld,w,work,lwork,rwork,lrwork,iwork,liwork,info)
horrible or even difficult? It’s flexible, efficient and I personally like to see the ranges and leading dimensions of my arrays explicitly.

The workspace allocation is tedious, and it’s easy to forget the purpose of the parameters unless you use it often. The fact that Fortran is a compiled language, also makes it hard to use in certain settings, which is what led to MATLAB, quoting Cleve Moler:

In the 1970s and early 1980s, while I was working on the LINPACK and EISPACK projects that I discussed in two previous posts, I was a Professor of Mathematics and then of Computer Science at the University of New Mexico in Albuquerque. I was teaching courses in Linear Algebra and Numerical Analysis. I wanted my students to have easy access to LINPACK and EISPACK without writing Fortran programs. By “easy access” I meant not going through the remote batch processing and the repeated edit-compile-link-load-execute process that was ordinarily required on the campus central mainframe computer.

In the documentation of the 1981 edition of MATLAB, it says


  Since MATLAB is not primarily concerned
  with either execution time  efficiency  or  storage  savings,  it
  ignores  most  of  the special matrix properties that LINPACK and
  EISPACK subroutines  use  to  advantage.

Also the Octave authors, interested in teaching chemical reactor design, write about its origin:

Octave was originally conceived (in about 1988) to be companion software for an undergraduate-level textbook on chemical reactor design [… ]

There were still some people who said that we should just be using Fortran instead, because it is the computer language of engineering, but every time we had tried that, the students spent far too much time trying to figure out why their Fortran code failed and not enough time learning about chemical engineering. We believed that with an interactive environment like Octave, most students would be able to pick up the basics quickly, and begin using it confidently in just a few hours.

MarDie · August 16, 2023, 9:31pm

Having different interfaces is certainly an option to fulfill conflicting requirements. With optional arguments a large portion of it could be even realized with a single function.
I still think that the LAPACK API is archaic, it was designed when allocation was not possible and computations were done on statically allocated memory of which portions where used as ‘work space’.

FortranFan · August 16, 2023, 10:01pm

Yes, in the past tense with the coding practices at the time.

Somebody - such as the likes of @jacobwilliams - needs to rewrite from scratch the whole of LAPACK in modern Fortran with a close eye on concurrent execution and coarray callers and add it to stdlib.

And compare the performance with original starting with the sequential case .

That will be a good test case of whether the modern Fortran endeavor starting Fortran 90 has been worth it. Meaning, pack it up and call it a day on modern Fortran if performance doesn’t compare well with the original that employed legacy practices ranging from FORTRAN I thru’ 77 and several related dialects at the time.

Yes, it’s a monumental effort but something the Fortran community must try soon.

jkd2022 · August 16, 2023, 10:13pm

Suppose you have to perform many diagonalizations on small arrays. You will want to avoid the overhead of allocate and deallocate (effectively malloc() and free()) by reusing the work arrays. The LAPACK interface accommodates this.

Beliavsky · August 16, 2023, 10:18pm

A few related stdlib GitHub issues I see are

github.com/fortran-lang/stdlib

Parallel linalg

opened 05:59PM - 02 Jan 20 UTC

certik

topic: mathematics

The modern Fortran API for a serial linear algebra (#10) seems natural. How w…ould that be extended to work in parallel using co-arrays? If there is a similar "natural" parallel API for linear algebra using modern Fortran, then that would be a good candidate for inclusion into stdlib, and we can have different backends that do the work (Scalapack, ..., perhaps even our own simpler reference implementation using co-arrays directly), that way if somebody writes a faster 3rd party library, then it could be plugged in as a backend, and user codes do not need to change, because they would already be using the stdlib API for parallel linear algebra.

github.com/fortran-lang/stdlib

The first step in facilitating BLAS integration into stdlib: add a solve function

opened 09:05AM - 06 May 23 UTC

zoziha

idea

### Motivation Fortran arrays have a natural advantage in linear algebra, but u…nfortunately, stdlib currently does not integrate any high-level function implementation based on BLAS, LAPACK interfaces, such as `det`, `solve`, `inv`. For Fortran users, this is frustrating. Obviously we can use LAPACK directly, but stdlib's motivation is to be the math library that can catch up with `numpy`, and linear algebra is essential. (I understand that this may not be a purely technical issue, it may involve consensus and norms.) I'd like to start by trying to link `openblas` in stdlib (`numpy` gives preference to `openblas` and `mkl`) and show you that we are enthusiastic about BLAS. Maybe it won't succeed directly, but I'll still try to implement `linalg.solve` like `numpy`. `solve` is not computationally efficient: There are two array assignments before `_gesv` is called, which takes a bit of time for large arrays, but is good for ease of use. Also, since this is the first time LAPACK is encapsulated, there is no assertion on the return value `info`, so discussion is welcome. ### Prior Art - [numpy.linalg.solve](https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html) - [scipy.linalg.solve](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve.html#scipy.linalg.solve) ### Additional Information _No response_

github.com/fortran-lang/stdlib

Interfaces and wrappers: BLAS

opened 11:52AM - 02 Jul 21 UTC

awvwgk

topic: mathematics idea

**Description** To support linear algebra operations we want to have interf…aces/wrappers for BLAS and LAPACK, maybe even to BLACS, PBLAS and ScaLAPACK at some point. Before we start working on a high-level API, there are additional stages to consider. For this issue I want to focus on BLAS only, because the size of the problem is much more limited than the other linear algebra libraries and we can flesh out the workflow 1. How to include BLAS in stdlib? - building (reference) BLAS from source with fpm and/or CMake sounds like a bad idea (performance), but might be a fallback - linking against system BLAS and optimized BLAS libraries like MKL or OpenBLAS 2. How due we provide the BLAS functionality? - interface modules like BLAS95 are available - do we reproduce the actual reference BLAS interface (doesn't include `intent` in the dummy arguments) - do we overload the names to drop the s/d/c/z prefixes - are the interfaces part of stdlib or bundled from a separate project 3. How do we abstract the operations available in BLAS? - how many layers of abstractions do we want? - just simple wrapper to infer dimensions - overloaded operators

github.com/fortran-lang/stdlib

Parallel linalg

opened 05:59PM - 02 Jan 20 UTC

certik

topic: mathematics

The modern Fortran API for a serial linear algebra (#10) seems natural. How w…ould that be extended to work in parallel using co-arrays? If there is a similar "natural" parallel API for linear algebra using modern Fortran, then that would be a good candidate for inclusion into stdlib, and we can have different backends that do the work (Scalapack, ..., perhaps even our own simpler reference implementation using co-arrays directly), that way if somebody writes a faster 3rd party library, then it could be plugged in as a backend, and user codes do not need to change, because they would already be using the stdlib API for parallel linear algebra.

ivanpribec · August 16, 2023, 10:20pm

It’s the static nature which makes it more distributable and embeddable in other programming languages. A Fortran allocate failure could crash the Python or R interpreter, not to mention the pain of distributing the Fortran runtime library. Some expert users might prefer to use their own allocation methods, that happen at carefully chosen “safe” points of the program. I guess not all users of LAPACK and BLAS have the safety of a virtual address space either.

I see the orthogonality of memory (containers) and algorithms as one of the fundamental principles in reusable software design.

With that said, the LAPACK authors are on a path to change this. In the the new C++ API for BLAS and LAPACK they plan to do the following:

As memory allocation is typically quick, the C++ LAPACK interface allocates optimal workspace sizes internally, thereby removing workspaces from the interface.

If this becomes a performance bottleneck, workspaces could be added as optional arguments — with a default value of nullptr indicating that the wrapper should allocate workspace — without breaking code written with the C++ LAPACK API.

Rationale: As needed, there is a possibility of adding an overloaded function call that takes a
user-defined memory allocator as an argument. This may serve memory-constrained implementations that insist on controlled memory usage.

RonShepard · August 16, 2023, 10:42pm

The extra copy is required because ultimately there is a dummy argument that is assumed size or explicit shape. That is because of the limitation to f77 argument conventions in the working routines. If those routines in LAPACK had been updated to use, for example, assumed shape arrays, then many (or all) of the copies that are currently required would be eliminated.

Regarding the extra storage for the copies, the space itself is only one of the problems. Sometimes an even bigger problem is the overhead associated with the allocation and deallocation of that space. If that is done on the heap (typical of a fortran allocate() statement), then that can require more effort than the subsequent floating point operations on that array. If it is done on the stack, which requires less effort, then it can result in runtime errors when the available stack is exceeded. For some reason, fortran compilers do not attempt stack allocation first at run time, and then fall back to heap allocation if necessary, for temporary arrays, and the fortran programmer is not provided any facility to query or monitor the available stack space, so he cannot handle the situation manually.

Regarding the LAPACK95 interface, one problem is that the API was set during that long period between f90 and f2003 when allocatable arrays were not allowed as dummy arguments or as components of a user defined type. If that feature had been available when the API were designed, then the work arrays could have been declared as allocatable arrays or as allocatable components of a defined type, and the work routine could have allocated them to the correct size if necessary all during a single call. With the older f77 interface, the programmer must first make the size query, then allocate the arrays, and then make the actual subroutine call. So both the f77 and the LAPACK95 interfaces are inflexible compared to what current fortran would allow.

PierU · August 17, 2023, 7:26am

How does a code based on assumed shape arguments compare to a code based on INC* and LD* along with assumed size arguments (current LAPACK), in terms of performance?

RonShepard · August 17, 2023, 5:56pm

I would think they would be comparable, but I have not rewritten and timed enough codes to know for certain. Note that the LAPACK conventions (f77 interface) always requires the elements in the leading dimension of the matrix to be contiguous. An assumed shape dummy argument would be more general, it allows arbitrary strides in both matrix dimensions. So to some extent it is an apples to oranges comparison because of this. However, internally the LAPACK routines make copies (in the provided work array) of subblocks of the input matrices, and the actual floating point operations are done on those copies, not the original arrays. That addresses the locality and cache issues that impact performance. So I think it is important that the input work array (or any other temporary work arrays) be contiguous, but the assumed shape dummy matrices themselves are less critical.

FortranFan · August 17, 2023, 7:33pm

There is a school of thought it is not comparable and this too retains certain codebases, including LAPACK, in the legacy (usually FORTRAN IV thru’ FORTRAN 77 + some addons) arena.

A properly refactored LAPACK, or more like linear algebra, is where modern Fortran either shines fully, or gets proven as a failed experiment. World deserves to know which is it!

PierU · August 18, 2023, 6:27am

That’s one of the “issues”, in the F77 version the compiler can probably better (or at least more easily) optimize computations along the 1st dimension of the matrices.

Do you speak about the reference implementation, or about optimized ones ? Not all the LAPACK routines have work arrays.

PierU · August 18, 2023, 4:53pm

Actually, quickly browsing some discussions on the LAPACK github, it seems that their main argument against incorporating modern Fortran features into LAPACK is because they want the code to be transcodable with f2c.

Good point… It also means that Fortran libraries are somehow stuck to conventional/legacy interfaces as soon as one want to make them easily callable from other langages.

Beliavsky · August 18, 2023, 5:33pm

It’s a problem when arguably the most important library for a language is maintained by people with no interest in exploiting the benefits of the current version of the language. Their plans are to move to C and C++. I don’t think there is a legal restriction on forking LAPACK, modernizing it, and still calling it LAPACK. At LAPACK — Linear Algebra PACKage I see

Like all software, [LAPACK] is copyrighted. It is not trademarked, but we do ask the following:

If you modify the source for these routines we ask that you change the name of the routine and comment the changes made to the original.

Since the current cryptic names follow the 6-letter restriction of Fortran 77, longer and more descriptive names are something a modernized LAPACK would use anyway.

Topic		Replies	Views
Linear algebra support in stdlib stdlib	27	1681	March 27, 2025
C++ Standard Library dense linear algebra interface	21	2339	August 11, 2023
Matrix index-pointer confusion Help	21	356	August 11, 2024
SAVE attribute scope	40	474	August 26, 2024
Pure wrappers for impure procedures	13	1395	April 15, 2022

LAPACK interfaces

Related topics