# Help with Blas band matrix operations (SOLVED)

Solution at the end

Hi everyone,

I’m having a little bit of trouble with Blas’ band matrix operation subroutine DGBMV.

From the man page:

``````DGBMV  performs one of the matrix-vector operations

y := alpha*A*x + beta*y,   or   y := alpha*A**T*x + beta*y,

where alpha and beta are scalars, x and y are vectors and A is an
m by n band matrix, with kl sub-diagonals and ku super-diagonals.
``````

And so I’m using this to compute a matrix-vector multiplication, therefore I’m setting beta=0.0 and alpha=1.0. I wrote a little function to help with the usage:

``````function lmm(A, x, ninj, nsup, nsub) result(Cout)
implicit none
real(8), intent(in) :: A(:,:), x(: )
real(8) :: Cout(size(x))
real(8) :: AC(size(A,1),size(A,2)), xC(size(x))
integer, intent(in) :: nsub, nsup, ninj

AC(:,:) = A(:,:)
xC(:) = x(:)

Cout = 0.0

call dgbmv('n', ninj, ninj, nsup, nsub, 1.0,&
AC, size(AC,1), xC, 1, 0.0, Cout, 1)
``````

end function lmm

Everything works but the results are completely wrong, however when I use matmul the results are as expected. I’ve also noticed that if I set Cout to another value before calling the subroutine I get different results.

I’ve fount in the IBM docs that the vector y should not have any element in common with both the matrix A and the vector x, maybe the problem is related to that but to be honest I couldn’t figure it out.

To convert the matrix from the dense to sparse format I’m using the following function:

``````function denseToBand(A, nsup, nsub) result(AB)
implicit none
integer, intent(in) :: nsub, nsup
real(8), intent(in) :: A(:,: )
integer :: j, i
real(8) :: AB(2*nsub+nsup+1, size(A,1))

do j=1, size(A,2)
do i = max(1, j-nsup), min(size(A,1), j+nsub)
AB(nsup+1-j+i,j) = A(i,j)
enddo
enddo

end function denseToBand
``````

The Case I’m working on:(following the advice of Arjen)
I’m currently solving a 2D differential equation and and the coefficient matrix A (in Ax=b of the linear system) has the following visual representation:

In which the green are zeros. The diagonal is composed by columns of the domain grid, as it can be seen at the start and end of each of these sections the value is different corresponding to a boundary condition. This implies that the x vector (in Ax=b) is of the form:

``````|x_1,1|
|x_2,1|
...
|x_n-1,n|
|x_n,n|
``````

which essentially are the columns of the domain grid stacked on top of each other.

The explicit solution involves multiplying A by the previous solution so: matmul(A, x). When I use matmul I get the following result:

However when I use the Blas subroutine I get the following:

Which in this case means that the subroutine is outputting a 0 valued vector, i.e. the multiplication is outputting 0. And the only lines in the code I changed were from this:

``````   RHS = dertg*dt*matmul(A, phi) + dt*sig + phi
``````

to this:

``````phiCopy = phi
RHS = dertg*dt*lmm(Adense, phiCopy, ninj, nsup, nsub) + dt*sig + phi
``````

Thank you so much.

SOLUTION
Following Ivan advice I set the parameters alpha and beta to double precision and it worked perfectly. It should also be noted that my original formulation for the banded matrix was wrong, which is the appropriate for Blas’ DGBSV (linear system solver), or DGBTRF as pointed out by Ivan. For the DGBMV the sparse formulation is the one provided in the docs and pointed out by Ivan.
TL;DR:

• Use double precision in both alpha and beta parameters
• Use appropriate sparse matrix formulation provided by the docs.

I cannot thank enough Ivan for his thorough replies !

Edit 1-3: Formatting
Edit 4: changed the function used to convert the dense to band matrix since I’m now using the one presented in the DGBMV documentation and I’m still getting the same results. This is just to clarify following the suggestion of Ivan.
Edit 5: Supplying the example.
Edit 6: Solution provided at the end the problem was the precision of alpha and beta.
Edit 7: Typos

The BLAS documentation gives the following code snippet to convert a dense rectangular matrix of size `M × N` to banded form:

``````       The following program segment will transfer a band matrix
from conventional full matrix storage to band storage:

DO 20, J = 1, N
K = KU + 1 - J
DO 10, I = MAX( 1, J - KU ), MIN( M, J + KL )
A( K + I, J ) = matrix( I, J )
10    CONTINUE
20 CONTINUE
``````

I haven’t tested this, but it looks like the `nsub` in your command `ab(nsub+nsup+1+i-j) = ...` is not necessary according to the BLAS snippet.

Hi Ivan

Thank you so much for the reply!

I’ve tried that formulation as well but got the same results. I believe that both formulations are equivalent.

I assume the size of your banded matrix is `A(2*nsub+nsup+1)` because the extra space is needed to store a banded matrix factorization.

I threw together a module of your routine and the BLAS one, and it only changes the alignment of the matrices:

Click to see code
``````module helpers

implicit none
contains

function denseToBand(A, nsup, nsub) result(AB)

integer, intent(in) :: nsub, nsup
real(8), intent(in) :: A(:,: )
integer :: j, i
real(8) :: AB(2*nsub+nsup+1, size(A,1))

do j = 1, size(A,1)
do i = max(1, j - nsup), min(size(A,1), j+nsub)
AB(nsub+nsup+1+i-j, j) = A(i,j)
end do
end do

end function

function denseToBand_BLAS(A, ku, kl) result(AB)

integer, intent(in) :: ku, kl
real(8), intent(in) :: A(:,:)
integer :: j, i, k
real(8) :: AB(2*kl+ku+1, size(A,1))
integer :: m, n

m = size(A,1)
n = size(A,2)

DO J = 1, N
K = KU + 1 - J
DO I = MAX( 1, J - KU ), MIN( M, J + KL )
AB( K + I, J ) = A( I, J )
END DO
END DO

end function

subroutine print_mat(A)
real(8), intent(in) :: A(:,:)
integer :: i
do i = 1, size(A,1)
write(*,'(*(F8.3,2X))') A(i,:)
end do

end subroutine

end module

program main

use helpers
implicit none

integer, parameter :: n = 6
real(8), allocatable :: Ab1(:,:), Ab2(:,:)

print *, "Dense"

print *, "Banded BLAS"
call print_mat(Ab1)

print *, "Banded OP"
call print_mat(Ab2)

end program
``````
``````\$ gfortran test_dense_blas.f90
\$ ./a.out
Dense
0.340     0.276     0.325     0.892     0.907     0.665
0.512     0.773     0.420     0.045     0.310     0.777
0.164     0.399     0.325     0.271     0.869     0.528
0.447     0.670     0.216     0.580     0.272     0.903
0.805     0.390     0.425     0.592     0.975     0.617
0.995     0.902     0.771     0.979     0.900     0.610
Banded BLAS
0.000     0.276     0.420     0.271     0.272     0.617
0.340     0.773     0.325     0.580     0.975     0.610
0.512     0.399     0.216     0.592     0.900     0.000
0.000     0.000     0.000     0.000     0.000     0.000
Banded OP
0.000     0.000     0.000     0.000     0.000     0.000
0.000     0.276     0.420     0.271     0.272     0.617
0.340     0.773     0.325     0.580     0.975     0.610
0.512     0.399     0.216     0.592     0.900     0.000
``````

Edit: when I used the banded factorization routines previously, your layout (with the initial “empty” row) was the one that worked. I have to study now the call sequence of the multiplication routine.

In case your `nsub` is equal to the lower diagonals, and `nsup` is the number of upper diagonals, it seems liked you swapped them in the `dgbmv` call sequence:

``````subroutine dgbmv 	( 	character  	TRANS,
integer  	M,
integer  	N,
integer  	KL,                ! number of lower diags
integer  	KU,                ! number of upper diags
double precision  	ALPHA,
double precision, dimension(lda,*)  	A,
integer  	LDA,
double precision, dimension(*)  	X,
integer  	INCX,
double precision  	BETA,
double precision, dimension(*)  	Y,
integer  	INCY
)
``````

We also need to make sure you got this part right:

``````          A is DOUBLE PRECISION array, dimension ( LDA, N )
Before entry, the leading ( kl + ku + 1 ) by n part of the
array A must contain the matrix of coefficients, supplied
column by column, with the leading diagonal of the matrix in
row ( ku + 1 ) of the array, the first super-diagonal
starting at position 2 in row ku, the first sub-diagonal
starting at position 1 in row ( ku + 2 ), and so on.
``````

Thank you so much ! I’ll correct that!

But in the case I’m working on nsup=nub so I’m afraid that is still not the cause of the problem

1 Like

I am no expert on BLAS , but could you provide a complete program that exhibits the problem? This will help others to reproduce it and identify the cause.

1 Like

One small suggestion might be to use the same precision in all of your declarations and constants. This means using `0.0d0` in your initialization of `Cout` and `1.0d0` and `0.0d0` for your coefficients alpha and beta. Preferably you would source a kind constant like `wp` from a module:

``````module prec
implicit none
integer, parameter :: wp = kind(1.0d0)
end module
``````

and use it throughout your code:

``````module banded_matrices
use prec
implicit none
contains

function lmm(A,x,n,kl,ku)
real(wp), intent(in) :: A(:,:), x(:)
integer, intent(in) :: n, kl, ku

! ...
Cout = 0.0_wp
! ...
end function

end module
``````

By the looks of things your original code (before Edit 4) uses wrong alignment of the banded storage of matrix A with respect to what the `dgbmv` routine expects.

I’ve created a full example using the “top” alignment (following Edit 4), and the results from `matmul` and `dgbmv` are equal down to the last few decimal places:

Full example

Don’t forget to link with `-lblas`.

``````module helpers

implicit none
integer, parameter :: dp = kind(1.0d0)

interface
subroutine dgbmv(trans,m,n,kl,ku,alpha,a,lda,x,incx,beta,y,incy)
import dp
character(len=1), intent(in) :: trans
integer, intent(in) :: m, n, kl, ku, lda, incx, incy
real(dp), intent(in) :: alpha, a(lda,*), x(*), beta
real(dp), intent(inout) :: y(*)
end subroutine
end interface

contains

!> Banded matrix times vector multiplication
!>
!> Assumes matrix is stored in the first (kl + ku + 1) rows of array A
!>
subroutine banded_times_vec(A, x, n, kl, ku, y)
real(dp), intent(in) :: A(:,:), x(: )
integer, intent(in) :: n, kl, ku
real(dp) :: y(size(x))

y = 0.0d0

call dgbmv('n',n,n,kl,ku,1.0d0,A,size(A,1),x,1,0.0d0,y,1)

end subroutine

!> Replace values below the diagonal kl and above diagonal ku with zero
subroutine make_banded(A,kl,ku)
real(dp), intent(inout) :: A(:,:)
integer, intent(in) :: kl, ku

integer :: i, j

do j = 1, size(A,1)
do i = 1, size(A,2)
if (max(1, j - ku) <= i .and. i <= min(size(A,1), j+kl)) then
cycle
else
A(i,j) = 0.0d0
end if
end do
end do

end subroutine

function denseToBand(A, nsup, nsub) result(AB)

integer, intent(in) :: nsub, nsup
real(dp), intent(in) :: A(:,: )
integer :: j, i
real(dp) :: AB(2*nsub+nsup+1, size(A,1))

do j = 1, size(A,1)
do i = max(1, j - nsup), min(size(A,1), j+nsub)
AB(nsub+nsup+1+i-j, j) = A(i,j)
end do
end do

end function

function denseToBand_BLAS(A, ku, kl) result(AB)

integer, intent(in) :: ku, kl
real(dp), intent(in) :: A(:,:)
integer :: j, i, k
real(dp) :: AB(2*kl+ku+1, size(A,1))
integer :: m, n

m = size(A,1)
n = size(A,2)

DO J = 1, N
K = KU + 1 - J
DO I = MAX( 1, J - KU ), MIN( M, J + KL )
AB( K + I, J ) = A( I, J )
END DO
END DO

end function

subroutine print_mat(A)
real(dp), intent(in) :: A(:,:)
integer :: i
do i = 1, size(A,1)
write(*,'(*(F8.3,2X))') A(i,:)
end do
end subroutine

end module

program main

use helpers
implicit none

integer, parameter :: n = 6
integer, parameter :: kl = 1, ku = 1
real(dp), allocatable :: Ab1(:,:), Ab2(:,:)
integer :: i
real(dp) :: x(n), y(n), yb1(n), yb2(n)

print *, "Dense"

! returns leading diagonal at row ku + 1
print *, "Banded BLAS"
call print_mat(Ab1)

! returns leading diagonal at row kl + ku + 1
print *, "Banded OP"
call print_mat(Ab2)

call random_number(x)
print *, "x = "
do i = 1, n
print *, x(i)
end do

! Dense matmul

! Banded matmul
call banded_times_vec(Ab1,x,n,kl,ku,yb1) ! correct
call banded_times_vec(Ab2,x,n,kl,ku,yb2) ! wrong

print *, "y = "
do i = 1, n
print *, i, y(i), yb1(i), yb2(i)
end do

end program
``````

It is also possible to use your original alignment (which is the one expected by the factorization routine `DGBTRF`), but you need to pass the address of element `A(NSUB+1,1)` instead of `A` in the call to `dgbmv` (`lda` remains the same) in your original example.

I would be happy to learn from other posters on this forum, if there is standard-conformant way to do this when using an explicit interface for the `dgbmv` routine. I found what I think is a solution using pointers:

Solution
``````  subroutine banded_times_vec(A, x, n, kl, ku, y)
real(dp), intent(in), contiguous, target :: A(:,:)
real(dp), intent(in) :: x(:)
integer, intent(in) :: n, kl, ku
real(dp) :: y(size(x))
real(dp), pointer, contiguous :: Ap(:) => null()

y = 0.0d0

Ap(1:size(A)) => A

call dgbmv('n',n,n,kl,ku,1.0d0,Ap(kl+1:),size(A,1),x,1,0.0d0,y,1)

end subroutine

``````

I’ve edited the post text to supply the example I’m working with

One more suggestion would be to use the full flexibility of the `dgbmv` interface. The `dertg*dt` can be considered as the `alpha` parameter, and the `dt*sig + phi` could be your initial `y` value.

``````alpha = dertg*dt
beta = 1.0d0

! rhs := dertg*dt*A*phi + (dt*sig + phi)
rhs = dt*sig + phi
call dgbmv('n',ninj,ninj,nsub,nsup,&
alpha,Abanded,size(Abanded,1),phi,1,beta,rhs,1)
``````

This way your code won’t be performing any memory allocation within the time loop of your PDE solver. There is also no need to perform copies.

1 Like