Banded solvers table

In case anyone finds it helpful, I’ve put together a small example using the general banded solvers from various Fortran libraries.

Origin Factorization Solve Driver Pivoting Comment
LAPACK DGBTRF DGBTRS DGBSV Y diagonals stored as rows
LINPACK DGBFA, DGBCO DGBSL / Y diagonals stored as rows
SLATEC DNBFA, DNBCO DNBSL DNBFS Y diagonals stored as columns
PPPACK BANFAC BANSLV / N diagonals stored as rows; only suitable for diagonally-dominant matrices
BODE [1] LTRI TSOL / Y diagonals stored as columns; only for systems with ml == mu; based on the procedures bandet1 and bansol1 [3] (in Algol)
Thorson [2] band solve / Y diagonals stored as columns; only for systems with ml == mu
Engeln-Müllges & Uhlig [4] BANDP BANDS BAND Y diagonals stored as columns
Numerical Recipes [5] bandec banbks / Y based on the procedures bandet1 and bansol1 [3]

Are there any other banded solvers available? If yes, please share them here.

Literature:

Demo:

program example

use, intrinsic :: iso_fortran_env, only: error_unit
implicit none

integer, parameter :: wp = kind(1.0d0)

! Constants
integer, parameter :: n = 5
integer, parameter :: ml = 1, mu = 2
integer, parameter :: bw = ml + mu + 1

! Dense matrix and vectors
real(wp) :: ad(n,n), b(n), xla(n), xsl(n)
real(wp) :: wrk(n)
integer :: ipiv(n)

! SLATEC format (extra column needed for pivoting)
real(wp) :: as(n,bw+ml)
integer :: itask, ind

! LAPACK format (additional ml rows needed by factorization)
real(wp) :: ab(bw+ml,5)
integer :: info


! Procedures
external :: dgbsv, dnbfs, banfac, banslv

! Other variables
integer :: m1, m21, m3

! Dense matrix
ad(1,:) = [1, 2, 3, 0, 0]
ad(2,:) = [4, 5, 6, 7, 0]
ad(3,:) = [0, 8, 9, 1, 2]
ad(4,:) = [0, 0, 3, 4, 5]
ad(5,:) = [0, 0, 0, 6, 7]

! Right-hand side
! A * x = [6, 22, 20, 12, 13]^T, x = [1]_n
b = sum(ad,dim=2)

! Convert dense to banded (extra storage is provided for the factorization)
ab = dense2bandedLA(ad,ml,mu)
as = dense2bandedSLATEC(ad,ml,mu)

!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
!
! Solve banded system A x = b with LAPACK
!

xla = b
call dgbsv(n,ml,mu,1,ab,n,ipiv,xla,n,info)
if (info /= 0) then
   write(error_unit,*) "[dgbsv] an error occured"
   error stop info
end if

write(*,'(A,*(/,G0))') "x (LAPACK) = ", xla

!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
!
! Solve banded system A x = b with SLATEC
!

xsl = b
itask = 1
call dnbfs(as,n,n,ml,mu,xsl,itask,ind,wrk,ipiv)
if (ind < 0) then
   write(error_unit,*) "[dnbfs] an error occured"
   error stop ind
end if
write(*,'(A,*(/,G0))') "x (SLATEC) = ", xsl


!>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
!
! Solve banded system A x = b with PPPACK 
! (not intended for arbitrary matrices as it doesn't use pivoting,
!  also watch out for single/double precision!)
!
ab = dense2bandedLA(ad,ml,mu)
b = sum(ad,dim=2)
xla = b

! Shift into first row (PPPACK doesn't need extra storage)
ab(1:bw,:) = ab(ml+1:ml+bw,:)

call banfac(ab,size(ab,1),n,ml,mu,info)
if (info /= 1) then
   write(error_unit,*) '[banfac] Factorization failed.'
   error stop info
end if
call banslv(ab,size(ab,1),n,ml,mu,xla)
write(*,'(A,*(/,G0))') "x (BANSLV) = ", xla

contains

!
! Convert to banded format using LAPACK storage
!
! Description found here: 
! - https://www.netlib.org/lapack/lug/node124.html
function dense2bandedLA(A, ml, mu) result(AB)
   integer, intent(in) :: ml, mu
   real(wp), intent(in) :: A(:,:)
   real(wp) :: AB(2*ml+mu+1, size(A,1))
   integer :: j, i
   ab = 0
   do j = 1, size(A,1)
      do i = max(1, j - mu), min(size(A,1), j+ml)
         AB(ml+mu+1+i-j,j) = A(i,j)
      end do
   end do
end function

!
! Convert to banded using SLATEC storage
! (diagonals are stored as columns)
!
function dense2bandedSLATEC(A, ml, mu) result(ab)
   integer, intent(in) :: ml, mu
   real(wp), intent(in) :: A(:,:)

   real(wp) :: ab(size(A,1),2*ml+mu+1)

   integer :: i, j1, j2, j, k
   ab = 0
   do i = 1, n
      j1 = max(1,i-ml)
      j2 = min(n,i+mu)
      do j = j1, j2
         k = j - i + ml + 1
         ab(i,k) = a(i,j)
      end do
   end do
end function

end program
6 Likes

The source code that was/is distributed with the Engeln-Mullges and Uhlig book “Numerical Methods in Fortran” has two banded solvers, band.f90 for general banded matrices and chobnd.f90 for symmetric positive definite matrices using Cholesky method. band.f90 uses pivoting.

Engeln-Mullges, Gisela and Uhlig, Frank, “Numerical Methods with Fortran,” Springer, 1996. (the u in Mullges has the two dots over it that I don’t know how to type here).

Link

Source code is available here

https://extras.springer.com/?query=978-3-642-80045-0

You might want to compare the code to LINPACK/SLATEC etc. to see how much of it is original (I think most of it is) and if any was derived from other sources

Edit. They also have Cuthill-McGee routines for reordering banded matrices.

3 Likes

Thanks. I always forget about that book since I don’t have a print copy. I should probably start a new table for symmetric matrices and Cholesky factorization (LAPACK has multiple versions of those) anyways.

There is a pair of banded Cholesky routines CHOBNDN / VRBNDN in the book: Schwarz, H. R. (1991). FORTRAN-Programme zur Methode der finiten Elemente (Vol. 3). BG Teubner.

For the banded QR decomposition, I’m aware of

I’ve found a pentadiagonal LU factorization (without gaps or pivoting) in the work of C. J. Fletcher: Computational Galerkin Methods | SpringerLink (useful for 1-d quadratic FEM or for a quintic spline interpolation).

Not to forget the classic libraries: NSWC, HSL, IMSL, NAG, or even the numerical library of Moscow State University.

The SPARSPAK library described in George and Liu (1981) has routines for SPD banded matrices. The code was available under some licence. I just used the Fortran code in the book as a guide to roll my own, then moved on to more advanced methods.

George, Alan, and Joseph W. H. Liu. Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall Series in Computational Mathematics. Englewood Cliffs, N.J: Prentice-Hall, 1981.

This is an interesting book if you want to see graph-theoretical algorithms in Fortran 77. We got some impressive speedup (in the late 1980s) using these methods - runtimes down from 24 hours to a few minutes on a microVAX and better than an order of magnitude decrease in memory.

Edit: There is a version of SPARSEPAK on John Burkardt’s web site. That doesn’t seem to have a banded solver, but does have a related envelope storage method (with Reverse Cuthill-McKee reordering).

https://people.math.sc.edu/Burkardt/f77_src/sparsepak/sparsepak.html
https://people.math.sc.edu/Burkardt/f_src/sparsepak/sparsepak.html

1 Like

In structural engineering, “Banded Solvers” has referred to fixed band solvers. There is a special class of banded solvers for large symmetric banded matrices. These are clasified as skyline solvers, with a variable band and the data is stored in columns. Column storage localises the equation referencing during the solution.

This was early described in “Towards Optimal in-core Equation Solving” by Digambar P. Mondkar and Graham H. Powell in 1974, complete with the old FORTRAN code.

The use of pivoting can destroy the banded status of a sparse matrix.
Typically, the skyline approach does not use pivoting, as the finite element equations are typically well behaved with a non-zero diagonal.

They do however require preliminary equation re-ordering, to minimise either the maximum band (for earlier fixed-band solvers) or to minimise the storage skyline. I have used the methods of Sloan (1985) and Hoit (1983). For 3D finite element models, a directional sort of nodes using an appropriate l,m,n direction, where ordering l*x + m*y +n*z can easily provide a good node ordering outcome. Ordering is extremely important, as minimum storage implies minimum operations for linear equation solution.

Mondkar and Powell published a second paper in 1974 using a blocked storage approach for (then) “large sets of equations”. As a cache blocking approach, this can be applied to OpenMP and provide an efficient multi-threaded approach for equation solution.

This is interesting. Does this paper look at the system as just a banded matrix and not just as a general sparse matrix (sorta splitting hairs with these definitions). I always thought that frontal or envelope type solvers were introduced by Bruce Irons in 1970.

https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.1620020104

Both banded and Skyline solvers do not address general sparsity, as structural FE equations typical have a symmetric matrix with an average bandwidth << number of equations. During reduction, most vaules within the profile become non-zero. (mostly > 99% implies same op count for both methods)
Any model with local mesh refinement produces a region in the matrix where bandwidth can become much larger than the average bandwidth, so the skyline was a significant improvement.
The frontal solver was a much more complex 2D description of the sparsity, so it would have been interesting to see the Fortran code extensions required to solve this problem.
The skyline solver’s 1D description suits the case where most/all matrix values below the skyline become non-zero during reduction.

Reordering is important for achieving a practical solution with either solution approach.
With a skyline solver, the equation order is important to minimise the average bandwidth/storage.
With a frontal solver, the element connectivity order is important to minimise the number of active equations at any time. With larger problems, it quickly became apparent that an automatic re-ordering routine was required for practical problems. RCM and then Sloan provided a useful solution for this approach. Hoit’s algorithm produced both a node and element order. Equation reordering before assemblying the matrix is much different to pivoting during the solution. Pivoting was not an approach that suited or was needed for FE problems, as it could change the storage requirement.

My experience was more with the improvement that skyline solvers offerred in comparison to fixed band solvers. Skyline performs reasonably in comparison to wavefront solvers, with a much simpler algorithm. The multi-block skyline approach can also easily adapt to where the active equations triangle of storage was too big for available memory. I have spent many years revising the efficiency of daxpy and ddotp, as hardware has changed.

The significance of these classes of solver is due to the typical sparsity patterns of the finite element stiffness matrix, which is not extremely sparse, but with a high degree of infill during reduction. I have not investigated other multi-fronted approaches.

One interesting aspect of the multi block storage skyline solver that is used for large systems of equations is that by applying OpenMP to this storage scheme, it produces an efficient multi-threaded solution, to what is a sequential solution process, again with a fairly simple solution algorithm. This is especially so with cache sized blocks when now that 64 GBytes or 128 GBytes of installed memory is readily available. In the 70’s we had the matrix stored in blocks on the disk, that were paged into (sometimes paged) memory, while we now have the matrix stored in cache sized blocks in memory, that are optimised into cache, to minimise cache to memory transfers.

Both skyline and frontal solvers are direct solvers. They were developed as they addressed the type of sparsity that characterised large finite element matrices. This solution could be reused many times, such as in time step solutions, where the matrix values do not vary significantly with time or load. There are other classes of sparse solvers where an itterative solution is preferred, but this all depends on the matrix characteristic.

It is difficult to remember what “large” meant in 1970 !

Thanks, I thought there was probably a difference at some level. I was curious about the paper you referenced because I couldn’t find it mentioned in any of the 13 or so Finite Element books I own.

Indeed. In the late 1970s early 80s, running a 200000 grid point transonic potential flow solution around a wing was considered a “large” problem. Now its turbulent Navier-Stokes solutions using hundreds of millions of points around a complete aircraft and the 2K grid potential flow solution can be done in a matter of seconds on a Raspberry PI.

I remember in the early 1980s, the demand was for an affordable 3-M machine: a million floating point operations per second (or sometimes instructions per second), a megabyte of memory, and a million pixel display (even black and white would have qualified, color even better).

This report (PDF, 1.95 MB) by D. A. Calahan of University of Michigan, sponsored by the US Air Force, describes the implementation of an unsymmetric banded solver for the CRAY-1, with the following interface:

Factorization

CALL BANFAC(N,M,A,NDIAG,NDROW)

Substitution

CALL BANSOL(N,M,A,NDIAG,NDROW,B)

where

  • N is the number of equations
  • M is the half-bandwidth (not including the diagonal)
  • A is the compresses band matrix array
  • NDIAG is the diagonal addressing increment
  • NDROW is the row addressing increment
  • B is the right hand side and the solution array

and a few additional restrictions are described in the report.


A second report (PDF, 671 KB) by the same author and sponsors describes the implementation of a symmetric banded solver for the CRAY-1, with the following interface:

Factorization

CALL SBANF(N,M,A(N11),NDIAG,NDROW)

where

  • N is the number of equations
  • M is the half-bandwidth (not including the diagonal)
  • A(N11) is the (1,1) element of the matrix
  • NDIAG is the storage increment between successive diagonal elements
  • NDROW is the storage increment between succesive column elements

Substitution

CALL SBANS(N,M,A(N11),NDIAG,NDROW,Y)

where

  • N...NDROW are defined above, except that A(N11) is the (1,1) element of the factorized matrix
  • Y is the right hand side on entry and the solution on exit

Additional restrictions which apply to Y are documented in the report.

The paper by Mondkar and Powell (1974) does cite the work of Irons (1970).


There is an ACM TOMS package that does this: Algorithm 664: A Gauss algorithm to solve systems with large, banded matrices using random-access disk storage: ACM Transactions on Mathematical Software: Vol 14, No 3

Judging from the small print, the author of the package, Géza Schrauf, was an employee of Messerschmitt-Bölkow-Blohm (MBB), a West German aerospace manufacturer, on a research visit at Caltech. Three years later at German Airbus (MBB was eventually merged into Airbus), he also published a banded complex eigenvalue solver.

An important aspect of skyline solvers is that they are based on Crout’s reduction algorithm, which rearranges the Gaussean reduction process in a way that better suits column storage and reduction. In this way the operations count can be efficiently reduced from a Gaussean approach.
The Crout inner loop is basically ddotp, rather than daxpy in Gaussean elimination.
This Skyline approach produces similar operations counts to Iron’s frontal solver, but uses only a 1-D description of sparsity (equations in each column), compared to the 2D frontal approach.

The main aspect of skyline solvers that was not addressed in early papers was the importance of equation re-ordering to improve the storage and so operations count required.

Fixed band solvers provided a very early solution, while more sophistocated solvers are typically commercial packages. The skyline solver addressed both efficiency and memory/disk storage, which were significant barriers for computation at that time.