LAPACK interfaces

I gave up on MKL LAPACK95 because someone at Intel thought it would be a cool idea to change the signature (ie order of arguments) on some of the subroutine argument lists versus what was in the Netlib distribution. Anyone know why they did that? I’ve found it just as easy to select the 10 or so things I will ever use from LAPACK and just write my own interfaces and wrapper routines that call the standard LAPACK or MKL LAPACK.

Why do you need an LAPACK interface at all? It may help for spotting errors in the passing of arguments, but beyond that it just seems like redundant code to me.

Detecting errors at compile time is really a big help, so I don’t consider it redundant code.

Compare calling LAPACK for computing eigenvalues with the interface of numpy. The LAPACK interface itself is horrible, but on top it forces the programmer to do work like passing dimensions that could be done by the language/compiler.

How is

call zheevd('V','U',n,a,ld,w,work,lwork,rwork,lrwork,iwork,liwork,info)

horrible or even difficult? It’s flexible, efficient and I personally like to see the ranges and leading dimensions of my arrays explicitly. For example, I can block diagonalize part of the matrix just by pointing to the starting element and changing n. This will not incur copy-in and -out inefficiencies. Dongarra et al. knew what they were doing.

1 Like

If you ask ChatGPT-4 how to call zheevd from scipy, it gives the following code, which is “glance-clear”. (I tested it, and it runs.) For the Fortran call there are many arguments, and it’s not clear to the uninitiated which arguments are inputs or outputs, and the fact that argument a is both an input and output will trip up people. The Fortran driver of ChatGPT (also below) is much longer than the Python one.

Python

import numpy as np
from scipy.linalg import lapack

# Create a complex Hermitian matrix
A = np.array([[1+0j, 2+3j, 4+0j],
              [2-3j, 5+0j, 6+1j],
              [4+0j, 6-1j, 8+0j]])

# Compute eigenvalues and eigenvectors using zheevd
eigenvalues, eigenvectors, info = lapack.zheevd(A)

if info:
    print("info:",info)
print("eigenvalues:", eigenvalues)
print("eigenvectors:", eigenvectors)

Fortran

program zheevd_example
  implicit none

  ! Declare necessary variables and constants
  integer, parameter :: n = 3
  integer :: info, lwork, lrwork, liwork
  integer, dimension(5*n) :: iwork
  real(8) :: rwork(1)
  complex(8), dimension(n,n) :: A, Z
  real(8), dimension(n) :: w
  complex(8), dimension(1) :: work(1)

  ! Define a complex Hermitian matrix
  data A / (1.0d0, 0.0d0), (2.0d0, 3.0d0), (4.0d0, 0.0d0), &
           (2.0d0, -3.0d0), (5.0d0, 0.0d0), (6.0d0, 1.0d0), &
           (4.0d0, 0.0d0), (6.0d0, -1.0d0), (8.0d0, 0.0d0) /

  ! Workspace query
  call zheevd('V', 'U', n, A, n, w, work, -1, rwork, -1, iwork, -1, info)
  lwork = int(real(work(1)))
  lrwork = int(rwork(1))
  liwork = iwork(1)

  ! Allocate work arrays
  allocate(work(lwork), rwork(lrwork))

  ! Compute eigenvalues and eigenvectors
  call zheevd('V', 'U', n, A, n, w, work, lwork, rwork, lrwork, iwork, liwork, info)

  ! Check for success
  if (info == 0) then
     print *, "Eigenvalues: ", w
     print *, "Eigenvectors: "
     print *, A
  else
     print *, "Error in zheevd, info = ", info
  end if

  ! Deallocate work arrays
  deallocate(work, rwork)

end program zheevd_example

ETA: Calling LAPACK95 looks like this:

program zheev_lapack95_example
  use f95_lapack, only: zheev
  implicit none

  ! Declare necessary variables
  integer, parameter :: n = 3
  integer :: info
  complex(8), dimension(n,n) :: A, Z
  real(8), dimension(n) :: w

  ! Define a complex Hermitian matrix
  data A / (1.0d0, 0.0d0), (2.0d0, 3.0d0), (4.0d0, 0.0d0), &
           (2.0d0, -3.0d0), (5.0d0, 0.0d0), (6.0d0, 1.0d0), &
           (4.0d0, 0.0d0), (6.0d0, -1.0d0), (8.0d0, 0.0d0) /

  ! Compute eigenvalues and eigenvectors
  call zheev(A, w, Z, info)

  ! Check for success
  if (info == 0) then
     print *, "Eigenvalues: ", w
     print *, "Eigenvectors: "
     print *, Z
  else
     print *, "Error in zheev, info = ", info
  end if

end program zheev_lapack95_example

yes, I call it horrible:

  • too many arguments
  • non-speaking names

It competes with something as simple as linalg.eigh(a).

In many cases a few percent performance gains that can be achieved by people who take details of a specific computer setup into account and spend time on performance measurements don’t compensate the extra work. I personally rather wait 20 minutes longer for a computation than spending time on coding.

Dongarra et al. knew what they were doing.

yes, they were masters of the tools of their time and the success of LAPACK is impressive. The same holds for Konrad Zuse. Still, I do not want to program a Z3.

I do the same. It would be nice if this work could be done once for everyone in a portable manner.

1 Like

The conversion from real to integer in the workspace query has always bothered me; there is even a bug related to it. Typically I will just the minimum value prescribed by the function, and ignore the workspace query. Does anyone know what’s the performance gain from an optimal size?

If you need to do a partial operation on a sub-block, and don’t want to make an extra copy, I see no other way than using the original (yet verbose) interface. It depends what type of work you are doing and whether storage savings matter to you.

I don’t see anything wrong with having two interfaces, an “expert” one, and a simplified one. So I don’t think your and @jkd2022’s views exclude each other. I attribute part of the success of LAPACK (and other Fortran packages used in SciPy for that matter) to its interfaces. (This doesn’t mean further improvements can’t be made; requirements change, and so must software.)


The workspace allocation is tedious, and it’s easy to forget the purpose of the parameters unless you use it often. The fact that Fortran is a compiled language, also makes it hard to use in certain settings, which is what led to MATLAB, quoting Cleve Moler:

In the 1970s and early 1980s, while I was working on the LINPACK and EISPACK projects that I discussed in two previous posts, I was a Professor of Mathematics and then of Computer Science at the University of New Mexico in Albuquerque. I was teaching courses in Linear Algebra and Numerical Analysis. I wanted my students to have easy access to LINPACK and EISPACK without writing Fortran programs. By “easy access” I meant not going through the remote batch processing and the repeated edit-compile-link-load-execute process that was ordinarily required on the campus central mainframe computer.

In the documentation of the 1981 edition of MATLAB, it says


  Since MATLAB is not primarily concerned
  with either execution time  efficiency  or  storage  savings,  it
  ignores  most  of  the special matrix properties that LINPACK and
  EISPACK subroutines  use  to  advantage.

Also the Octave authors, interested in teaching chemical reactor design, write about its origin:

Octave was originally conceived (in about 1988) to be companion software for an undergraduate-level textbook on chemical reactor design [… ]

There were still some people who said that we should just be using Fortran instead, because it is the computer language of engineering, but every time we had tried that, the students spent far too much time trying to figure out why their Fortran code failed and not enough time learning about chemical engineering. We believed that with an interactive environment like Octave, most students would be able to pick up the basics quickly, and begin using it confidently in just a few hours.

Having different interfaces is certainly an option to fulfill conflicting requirements. With optional arguments a large portion of it could be even realized with a single function.
I still think that the LAPACK API is archaic, it was designed when allocation was not possible and computations were done on statically allocated memory of which portions where used as ‘work space’.

Yes, in the past tense with the coding practices at the time.

Somebody - such as the likes of @jacobwilliams - needs to rewrite from scratch the whole of LAPACK in modern Fortran with a close eye on concurrent execution and coarray callers and add it to stdlib.

And compare the performance with original starting with the sequential case .

That will be a good test case of whether the modern Fortran endeavor starting Fortran 90 has been worth it. Meaning, pack it up and call it a day on modern Fortran if performance doesn’t compare well with the original that employed legacy practices ranging from FORTRAN I thru’ 77 and several related dialects at the time.

Yes, it’s a monumental effort but something the Fortran community must try soon.

1 Like

Suppose you have to perform many diagonalizations on small arrays. You will want to avoid the overhead of allocate and deallocate (effectively malloc() and free()) by reusing the work arrays. The LAPACK interface accommodates this.

A few related stdlib GitHub issues I see are

1 Like

It’s the static nature which makes it more distributable and embeddable in other programming languages. A Fortran allocate failure could crash the Python or R interpreter, not to mention the pain of distributing the Fortran runtime library. Some expert users might prefer to use their own allocation methods, that happen at carefully chosen “safe” points of the program. I guess not all users of LAPACK and BLAS have the safety of a virtual address space either.

I see the orthogonality of memory (containers) and algorithms as one of the fundamental principles in reusable software design.

With that said, the LAPACK authors are on a path to change this. In the the new C++ API for BLAS and LAPACK they plan to do the following:

As memory allocation is typically quick, the C++ LAPACK interface allocates optimal workspace sizes internally, thereby removing workspaces from the interface.

If this becomes a performance bottleneck, workspaces could be added as optional arguments — with a default value of nullptr indicating that the wrapper should allocate workspace — without breaking code written with the C++ LAPACK API.

Rationale: As needed, there is a possibility of adding an overloaded function call that takes a
user-defined memory allocator as an argument. This may serve memory-constrained implementations that insist on controlled memory usage.

1 Like

The extra copy is required because ultimately there is a dummy argument that is assumed size or explicit shape. That is because of the limitation to f77 argument conventions in the working routines. If those routines in LAPACK had been updated to use, for example, assumed shape arrays, then many (or all) of the copies that are currently required would be eliminated.

Regarding the extra storage for the copies, the space itself is only one of the problems. Sometimes an even bigger problem is the overhead associated with the allocation and deallocation of that space. If that is done on the heap (typical of a fortran allocate() statement), then that can require more effort than the subsequent floating point operations on that array. If it is done on the stack, which requires less effort, then it can result in runtime errors when the available stack is exceeded. For some reason, fortran compilers do not attempt stack allocation first at run time, and then fall back to heap allocation if necessary, for temporary arrays, and the fortran programmer is not provided any facility to query or monitor the available stack space, so he cannot handle the situation manually.

Regarding the LAPACK95 interface, one problem is that the API was set during that long period between f90 and f2003 when allocatable arrays were not allowed as dummy arguments or as components of a user defined type. If that feature had been available when the API were designed, then the work arrays could have been declared as allocatable arrays or as allocatable components of a defined type, and the work routine could have allocated them to the correct size if necessary all during a single call. With the older f77 interface, the programmer must first make the size query, then allocate the arrays, and then make the actual subroutine call. So both the f77 and the LAPACK95 interfaces are inflexible compared to what current fortran would allow.

1 Like

How does a code based on assumed shape arguments compare to a code based on INC* and LD* along with assumed size arguments (current LAPACK), in terms of performance?

I would think they would be comparable, but I have not rewritten and timed enough codes to know for certain. Note that the LAPACK conventions (f77 interface) always requires the elements in the leading dimension of the matrix to be contiguous. An assumed shape dummy argument would be more general, it allows arbitrary strides in both matrix dimensions. So to some extent it is an apples to oranges comparison because of this. However, internally the LAPACK routines make copies (in the provided work array) of subblocks of the input matrices, and the actual floating point operations are done on those copies, not the original arrays. That addresses the locality and cache issues that impact performance. So I think it is important that the input work array (or any other temporary work arrays) be contiguous, but the assumed shape dummy matrices themselves are less critical.

There is a school of thought it is not comparable and this too retains certain codebases, including LAPACK, in the legacy (usually FORTRAN IV thru’ FORTRAN 77 + some addons) arena.

A properly refactored LAPACK, or more like linear algebra, is where modern Fortran either shines fully, or gets proven as a failed experiment. World deserves to know which is it!

That’s one of the “issues”, in the F77 version the compiler can probably better (or at least more easily) optimize computations along the 1st dimension of the matrices.

Do you speak about the reference implementation, or about optimized ones ? Not all the LAPACK routines have work arrays.

Actually, quickly browsing some discussions on the LAPACK github, it seems that their main argument against incorporating modern Fortran features into LAPACK is because they want the code to be transcodable with f2c.

Good point… It also means that Fortran libraries are somehow stuck to conventional/legacy interfaces as soon as one want to make them easily callable from other langages.

It’s a problem when arguably the most important library for a language is maintained by people with no interest in exploiting the benefits of the current version of the language. Their plans are to move to C and C++. I don’t think there is a legal restriction on forking LAPACK, modernizing it, and still calling it LAPACK. At LAPACK — Linear Algebra PACKage I see

Like all software, [LAPACK] is copyrighted. It is not trademarked, but we do ask the following:

  • If you modify the source for these routines we ask that you change the name of the routine and comment the changes made to the original.

Since the current cryptic names follow the 6-letter restriction of Fortran 77, longer and more descriptive names are something a modernized LAPACK would use anyway.