Help with Blas band matrix operations (SOLVED)

Solution at the end

Hi everyone,

I’m having a little bit of trouble with Blas’ band matrix operation subroutine DGBMV.

From the man page:

DGBMV  performs one of the matrix-vector operations

y := alpha*A*x + beta*y,   or   y := alpha*A**T*x + beta*y,

 where alpha and beta are scalars, x and y are vectors and A is an
 m by n band matrix, with kl sub-diagonals and ku super-diagonals.

And so I’m using this to compute a matrix-vector multiplication, therefore I’m setting beta=0.0 and alpha=1.0. I wrote a little function to help with the usage:

function lmm(A, x, ninj, nsup, nsub) result(Cout)
    implicit none
    real(8), intent(in) :: A(:,:), x(: )
    real(8) :: Cout(size(x))
    real(8) :: AC(size(A,1),size(A,2)), xC(size(x))
    integer, intent(in) :: nsub, nsup, ninj

    AC(:,:) = A(:,:)
    xC(:) = x(:)

Cout = 0.0

call dgbmv('n', ninj, ninj, nsup, nsub, 1.0,&
    AC, size(AC,1), xC, 1, 0.0, Cout, 1)

end function lmm

Everything works but the results are completely wrong, however when I use matmul the results are as expected. I’ve also noticed that if I set Cout to another value before calling the subroutine I get different results.

I’ve fount in the IBM docs that the vector y should not have any element in common with both the matrix A and the vector x, maybe the problem is related to that but to be honest I couldn’t figure it out.

To convert the matrix from the dense to sparse format I’m using the following function:

function denseToBand(A, nsup, nsub) result(AB)
    implicit none
    integer, intent(in) :: nsub, nsup
    real(8), intent(in) :: A(:,: )
    integer :: j, i
    real(8) :: AB(2*nsub+nsup+1, size(A,1))

    do j=1, size(A,2)
         do i = max(1, j-nsup), min(size(A,1), j+nsub)
            AB(nsup+1-j+i,j) = A(i,j)
        enddo
    enddo

end function denseToBand

The Case I’m working on:(following the advice of Arjen)
I’m currently solving a 2D differential equation and and the coefficient matrix A (in Ax=b of the linear system) has the following visual representation:


In which the green are zeros. The diagonal is composed by columns of the domain grid, as it can be seen at the start and end of each of these sections the value is different corresponding to a boundary condition. This implies that the x vector (in Ax=b) is of the form:

|x_1,1|
|x_2,1|
...
|x_n-1,n|
|x_n,n|

which essentially are the columns of the domain grid stacked on top of each other.

The explicit solution involves multiplying A by the previous solution so: matmul(A, x). When I use matmul I get the following result:

good

However when I use the Blas subroutine I get the following:
bad

Which in this case means that the subroutine is outputting a 0 valued vector, i.e. the multiplication is outputting 0. And the only lines in the code I changed were from this:

   RHS = dertg*dt*matmul(A, phi) + dt*sig + phi

to this:

phiCopy = phi
RHS = dertg*dt*lmm(Adense, phiCopy, ninj, nsup, nsub) + dt*sig + phi

Thank you so much.

SOLUTION
Following Ivan advice I set the parameters alpha and beta to double precision and it worked perfectly. It should also be noted that my original formulation for the banded matrix was wrong, which is the appropriate for Blas’ DGBSV (linear system solver), or DGBTRF as pointed out by Ivan. For the DGBMV the sparse formulation is the one provided in the docs and pointed out by Ivan.
TL;DR:

  • Use double precision in both alpha and beta parameters
  • Use appropriate sparse matrix formulation provided by the docs.

I cannot thank enough Ivan for his thorough replies !

Edit 1-3: Formatting
Edit 4: changed the function used to convert the dense to band matrix since I’m now using the one presented in the DGBMV documentation and I’m still getting the same results. This is just to clarify following the suggestion of Ivan.
Edit 5: Supplying the example.
Edit 6: Solution provided at the end the problem was the precision of alpha and beta.
Edit 7: Typos

The BLAS documentation gives the following code snippet to convert a dense rectangular matrix of size M × N to banded form:

       The following program segment will transfer a band matrix
       from conventional full matrix storage to band storage:

             DO 20, J = 1, N
                K = KU + 1 - J
                DO 10, I = MAX( 1, J - KU ), MIN( M, J + KL )
                   A( K + I, J ) = matrix( I, J )
          10    CONTINUE
          20 CONTINUE

I haven’t tested this, but it looks like the nsub in your command ab(nsub+nsup+1+i-j) = ... is not necessary according to the BLAS snippet.

Hi Ivan

Thank you so much for the reply!

I’ve tried that formulation as well but got the same results. I believe that both formulations are equivalent.

I assume the size of your banded matrix is A(2*nsub+nsup+1) because the extra space is needed to store a banded matrix factorization.

I threw together a module of your routine and the BLAS one, and it only changes the alignment of the matrices:

Click to see code
module helpers

  implicit none
contains

function denseToBand(A, nsup, nsub) result(AB)

    integer, intent(in) :: nsub, nsup
    real(8), intent(in) :: A(:,: )
    integer :: j, i
    real(8) :: AB(2*nsub+nsup+1, size(A,1))

    do j = 1, size(A,1)
        do i = max(1, j - nsup), min(size(A,1), j+nsub)
            AB(nsub+nsup+1+i-j, j) = A(i,j)
        end do
    end do

end function

function denseToBand_BLAS(A, ku, kl) result(AB)

    integer, intent(in) :: ku, kl
    real(8), intent(in) :: A(:,:)
    integer :: j, i, k
    real(8) :: AB(2*kl+ku+1, size(A,1))
    integer :: m, n

    m = size(A,1)
    n = size(A,2)

     DO J = 1, N
        K = KU + 1 - J
        DO I = MAX( 1, J - KU ), MIN( M, J + KL )
           AB( K + I, J ) = A( I, J )
      END DO
    END DO

end function

subroutine print_mat(A)
  real(8), intent(in) :: A(:,:)
  integer :: i
  do i = 1, size(A,1)
    write(*,'(*(F8.3,2X))') A(i,:)
  end do

end subroutine

end module

program main

  use helpers
  implicit none

  integer, parameter :: n = 6
  real(8) :: Adense(n,n)
  real(8), allocatable :: Ab1(:,:), Ab2(:,:)

  call random_number(Adense)

  print *, "Dense"
  call print_mat(Adense)

  Ab1 = denseToBand_BLAS(Adense,1,1)
  print *, "Banded BLAS"
  call print_mat(Ab1)


  Ab2 = denseToBand(Adense,1,1)
  print *, "Banded OP"
  call print_mat(Ab2)

end program
$ gfortran test_dense_blas.f90
$ ./a.out
 Dense
   0.340     0.276     0.325     0.892     0.907     0.665
   0.512     0.773     0.420     0.045     0.310     0.777
   0.164     0.399     0.325     0.271     0.869     0.528
   0.447     0.670     0.216     0.580     0.272     0.903
   0.805     0.390     0.425     0.592     0.975     0.617
   0.995     0.902     0.771     0.979     0.900     0.610
 Banded BLAS
   0.000     0.276     0.420     0.271     0.272     0.617
   0.340     0.773     0.325     0.580     0.975     0.610
   0.512     0.399     0.216     0.592     0.900     0.000
   0.000     0.000     0.000     0.000     0.000     0.000
 Banded OP
   0.000     0.000     0.000     0.000     0.000     0.000
   0.000     0.276     0.420     0.271     0.272     0.617
   0.340     0.773     0.325     0.580     0.975     0.610
   0.512     0.399     0.216     0.592     0.900     0.000

Edit: when I used the banded factorization routines previously, your layout (with the initial “empty” row) was the one that worked. I have to study now the call sequence of the multiplication routine.

In case your nsub is equal to the lower diagonals, and nsup is the number of upper diagonals, it seems liked you swapped them in the dgbmv call sequence:

subroutine dgbmv 	( 	character  	TRANS,
		integer  	M,
		integer  	N,
		integer  	KL,                ! number of lower diags
		integer  	KU,                ! number of upper diags
		double precision  	ALPHA,
		double precision, dimension(lda,*)  	A,
		integer  	LDA,
		double precision, dimension(*)  	X,
		integer  	INCX,
		double precision  	BETA,
		double precision, dimension(*)  	Y,
		integer  	INCY 
	) 		

We also need to make sure you got this part right:

          A is DOUBLE PRECISION array, dimension ( LDA, N )
           Before entry, the leading ( kl + ku + 1 ) by n part of the
           array A must contain the matrix of coefficients, supplied
           column by column, with the leading diagonal of the matrix in
           row ( ku + 1 ) of the array, the first super-diagonal
           starting at position 2 in row ku, the first sub-diagonal
           starting at position 1 in row ( ku + 2 ), and so on.

Thank you so much ! I’ll correct that!

But in the case I’m working on nsup=nub so I’m afraid that is still not the cause of the problem

1 Like

I am no expert on BLAS :blush:, but could you provide a complete program that exhibits the problem? This will help others to reproduce it and identify the cause.

1 Like

One small suggestion might be to use the same precision in all of your declarations and constants. This means using 0.0d0 in your initialization of Cout and 1.0d0 and 0.0d0 for your coefficients alpha and beta. Preferably you would source a kind constant like wp from a module:

module prec
  implicit none
  integer, parameter :: wp = kind(1.0d0)
end module

and use it throughout your code:

module banded_matrices
   use prec
   implicit none
contains

function lmm(A,x,n,kl,ku)
  real(wp), intent(in) :: A(:,:), x(:)
  integer, intent(in) :: n, kl, ku

  ! ...
  Cout = 0.0_wp
  ! ...
end function

end module

By the looks of things your original code (before Edit 4) uses wrong alignment of the banded storage of matrix A with respect to what the dgbmv routine expects.

I’ve created a full example using the “top” alignment (following Edit 4), and the results from matmul and dgbmv are equal down to the last few decimal places:

Full example

Don’t forget to link with -lblas.

module helpers

  implicit none
  integer, parameter :: dp = kind(1.0d0)

  interface
    subroutine dgbmv(trans,m,n,kl,ku,alpha,a,lda,x,incx,beta,y,incy)
      import dp
      character(len=1), intent(in) :: trans
      integer, intent(in) :: m, n, kl, ku, lda, incx, incy
      real(dp), intent(in) :: alpha, a(lda,*), x(*), beta
      real(dp), intent(inout) :: y(*)
    end subroutine
  end interface

contains

  !> Banded matrix times vector multiplication
  !>
  !> Assumes matrix is stored in the first (kl + ku + 1) rows of array A
  !>
  subroutine banded_times_vec(A, x, n, kl, ku, y)
    real(dp), intent(in) :: A(:,:), x(: )
    integer, intent(in) :: n, kl, ku
    real(dp) :: y(size(x))

    y = 0.0d0

    call dgbmv('n',n,n,kl,ku,1.0d0,A,size(A,1),x,1,0.0d0,y,1)

  end subroutine

  !> Replace values below the diagonal kl and above diagonal ku with zero
  subroutine make_banded(A,kl,ku)
    real(dp), intent(inout) :: A(:,:)
    integer, intent(in) :: kl, ku

    integer :: i, j

    do j = 1, size(A,1)
      do i = 1, size(A,2)
        if (max(1, j - ku) <= i .and. i <= min(size(A,1), j+kl)) then
          cycle
        else
          A(i,j) = 0.0d0
        end if
      end do
    end do

  end subroutine

  function denseToBand(A, nsup, nsub) result(AB)

      integer, intent(in) :: nsub, nsup
      real(dp), intent(in) :: A(:,: )
      integer :: j, i
      real(dp) :: AB(2*nsub+nsup+1, size(A,1))

      do j = 1, size(A,1)
          do i = max(1, j - nsup), min(size(A,1), j+nsub)
              AB(nsub+nsup+1+i-j, j) = A(i,j)
          end do
      end do

  end function

  function denseToBand_BLAS(A, ku, kl) result(AB)

      integer, intent(in) :: ku, kl
      real(dp), intent(in) :: A(:,:)
      integer :: j, i, k
      real(dp) :: AB(2*kl+ku+1, size(A,1))
      integer :: m, n

      m = size(A,1)
      n = size(A,2)

       DO J = 1, N
          K = KU + 1 - J
          DO I = MAX( 1, J - KU ), MIN( M, J + KL )
             AB( K + I, J ) = A( I, J )
        END DO
      END DO

  end function

  subroutine print_mat(A)
    real(dp), intent(in) :: A(:,:)
    integer :: i
    do i = 1, size(A,1)
      write(*,'(*(F8.3,2X))') A(i,:)
    end do
  end subroutine

end module

program main

  use helpers
  implicit none

  integer, parameter :: n = 6
  integer, parameter :: kl = 1, ku = 1
  real(dp) :: Adense(n,n)
  real(dp), allocatable :: Ab1(:,:), Ab2(:,:)
  integer :: i
  real(dp) :: x(n), y(n), yb1(n), yb2(n)

  call random_number(Adense)
  call make_banded(Adense,kl,ku)

  print *, "Dense"
  call print_mat(Adense)

  ! returns leading diagonal at row ku + 1
  Ab1 = denseToBand_BLAS(Adense,kl,ku)
  print *, "Banded BLAS"
  call print_mat(Ab1)

  ! returns leading diagonal at row kl + ku + 1
  Ab2 = denseToBand(Adense,kl,ku)
  print *, "Banded OP"
  call print_mat(Ab2)


  call random_number(x)
  print *, "x = "
  do i = 1, n
    print *, x(i)
  end do

  ! Dense matmul
  y = matmul(Adense,x)

  ! Banded matmul
  call banded_times_vec(Ab1,x,n,kl,ku,yb1) ! correct
  call banded_times_vec(Ab2,x,n,kl,ku,yb2) ! wrong

  print *, "y = "
  do i = 1, n
    print *, i, y(i), yb1(i), yb2(i)
  end do

end program

It is also possible to use your original alignment (which is the one expected by the factorization routine DGBTRF), but you need to pass the address of element A(NSUB+1,1) instead of A in the call to dgbmv (lda remains the same) in your original example.

I would be happy to learn from other posters on this forum, if there is standard-conformant way to do this when using an explicit interface for the dgbmv routine. I found what I think is a solution using pointers:

Solution
  subroutine banded_times_vec(A, x, n, kl, ku, y)
    real(dp), intent(in), contiguous, target :: A(:,:)
    real(dp), intent(in) :: x(:)
    integer, intent(in) :: n, kl, ku
    real(dp) :: y(size(x))
    real(dp), pointer, contiguous :: Ap(:) => null()

    y = 0.0d0

    Ap(1:size(A)) => A

    call dgbmv('n',n,n,kl,ku,1.0d0,Ap(kl+1:),size(A,1),x,1,0.0d0,y,1)

  end subroutine

I’ve edited the post text to supply the example I’m working with

One more suggestion would be to use the full flexibility of the dgbmv interface. The dertg*dt can be considered as the alpha parameter, and the dt*sig + phi could be your initial y value.

alpha = dertg*dt
beta = 1.0d0

! rhs := dertg*dt*A*phi + (dt*sig + phi)
rhs = dt*sig + phi
call dgbmv('n',ninj,ninj,nsub,nsup,&
  alpha,Abanded,size(Abanded,1),phi,1,beta,rhs,1)

This way your code won’t be performing any memory allocation within the time loop of your PDE solver. There is also no need to perform copies.

1 Like