DGBMV blas issue with 2d array

I learn how to use blas and lapack libs in fortran for operationg with banded matricies. The code below i use to solve 2d heat equation by Peaceman‐Rachford schema, but i cut all not needed code. What i need? I need to multiply banded matrix coefficients on vector of function to get second deriv of function.
I create two arrays: test_array_col(2, N+1) and test_array_row(N+1, 2) and matrix band_matrix_y(3, N+1). I init array by 1.0. After calling dgbmv('N', N+1, N+1, 1, 1, 1.0_dp, band_matrix_y, 3, test_array_col(1, :), 1, 1.0_dp, test_array_col(1, :), 1) or dgbmv('N', N+1, N+1, 1, 1, 1.0_dp, band_matrix_y, 3, test_array_row(:, 1), 1, 1.0_dp, test_array_row(:, 1), 1) i expect get array where all elements are equal to 1.0, but i get this results only in first call.

The issue is that i need to store 2d array of my function and call dgbmv on each time step first for each row (work) and for each column (dont work). What can i do without transposing my 2d array of function on each time step. Of course i can implement this functionality myself, but i want to explain what happen below

The whole code here

program dgbmv_issue
    implicit none

    integer, parameter:: N=10, M=10, dp=(kind(0.d0))
    double precision, dimension(:, :):: band_matrix_y(3, N+1), test_array_col(2, N+1), test_array_row(N+1, 2)
    integer:: i

    double precision:: DX, DY, L_X, L_Y, DT

    external:: dgbmv

    test_array_col = 1.0_dp
    test_array_row = 1.0_dp

    L_X = 1.0_dp
    L_Y = 1.0_dp

    DX = L_X / M
    DY = L_Y / N
    DT = 0.01_dp

    band_matrix_y = 0.0_dp

    band_matrix_y(1, 3:) = 0.5_dp * DT / DY / DY
    band_matrix_y(2, 2:N) = -DT / DY / DY
    band_matrix_y(3, :N-1) = 0.5_dp * DT / DY / DY

    open(1, file='out.txt')

    do i = 1, 3
        write(1, *) band_matrix_y(i, :)
    end do

    call dgbmv('N', N+1, N+1, 1, 1, 1.0_dp, band_matrix_y, 3, test_array_col(1, :), 1, 1.0_dp, test_array_col(1, :), 1)

    write(1, *) 'X'
    write(1, *) test_array_col(1, :)

    call dgbmv('N', N+1, N+1, 1, 1, 1.0_dp, band_matrix_y, 3, test_array_row(:, 1), 1, 1.0_dp, test_array_row(:, 1), 1)

    write(1, *) 'X'
    write(1, *) test_array_row(:, 1)

    close(1)
end program dgbmv_issue

And here’s meson build file

project('dgbmv_issue', 'fortran')
lapack = dependency('lapack')
blas = dependency('blas')
executable('dgbmv_issue', 'dgbmv_issue.f90', dependencies: [lapack, blas])

Can anybody help me? Please🙏

One possible problem is that the input argument x(*) is aliased to the inout argument y(*). This applies to both calls, but apparently one call appears to work alright, while the other doesn’t. The reason it works alright is probably because of copy-in/copy-out argument association, so it is just by luck that it happens to work.

The compiler is not required to notify you of this error, but there might be some combination of compiler options where it would, at least if dgbmv() had an explicit interface. You are using implicit interfaces, so these kinds of errors are always easy to make and difficult to find.

You are using a combination of array slice notation actual argument with an implicit interface. I think you are doing that correctly, but it is confusing nonetheless. It might be simpler for you, and for anyone reading your code, if you were to avoid the array slice actual argument. I don’t know how to suggest that this should be changed because it also depends on how you choose to avoid the aliasing of the actual arguments.

1 Like

The first thing to fix is the aliasing, for both calls.

real, allocatable :: tmp(:)

tmp = test_array_col(1, :) )
call dgbmv('N', N+1, N+1, 1, 1, 1.0_dp, band_matrix_y, 3, tmp, 1, 1.0_dp, test_array_col(1, :), 1)

tmp = test_array_row(:,1)
call dgbmv('N', N+1, N+1, 1, 1, 1.0_dp, band_matrix_y, 3, tmp, 1, 1.0_dp, test_array_row(:, 1), 1)

Then replace the array sections by using the BLAS classical INC* arguments. This can work with array sections, but it should be more efficient without.

real, allocatable :: tmp(:)

tmp = test_array_col(1, :) )
call dgbmv('N', N+1, N+1, 1, 1, 1.0_dp, band_matrix_y, 3, tmp, 1, 1.0_dp, test_array_col, 2)

tmp = test_array_row(:,1)
call dgbmv('N', N+1, N+1, 1, 1, 1.0_dp, band_matrix_y, 3, tmp, 1, 1.0_dp, test_array_row, 1)

Thank you for your reply! The fact is that first dgbmv example call (for row) is stable, in full code for heat equation I call it in cycle and it’s work fine on all iterations, although one array passing as input and output argument. I don’t really know what to do

Thank you for your reply, but I need to compute expression y = alphaAx + beta*y where x=y, so I would rather not call vector+vector subroutine after dgbmv. In other words I can either copy column/row to tmp vector or call vector+vector subroutine after dgbmv? Is there no way to do what I try to do without extra memory allocation or subroutine calls?

As stated by @RonShepard this is by chance. Argument aliasing is illegal wrt the Fortran standard, as soon as one of the arguments is an output one. The behavior is undefined : it can work, or give wrong results, or crash, etc, depending on the compiler, on the compilation options, etc…

No, you can’t. You need a temporary array at some point, either for the input or for the output. I’ve corrected my previous answer according to what you want.

I also think this is correct. The arguments cannot be aliased, so you need to make a copy somehow. Another way do to that is to turn one of them into an expression in the call statement. The compiler then most likely creates the copy on the stack rather than the heap (where the allocatable array tmp(:) probably lives), so there are both advantages and disadvantages to this approach. Modifying the above code would give something like:

It is the extra parentheses around the argument that does the trick. Note that you are still making the copy, you are just letting the compiler do the heavy lifting. If that argument is needed several times, then it is probably best to allocate an assign it once in tmp(:) and then reuse it. If it is a one-time call, then maybe the expression is best. If the array slice is long, many megabytes, then the compiler might exhaust stack space, and then the tmp(:) array would have some advantages.

I think the increment of 1 is correct in the first call above, but that is one of the things that looks confusing when mixing modern fortran conventions with the f77-style blas/lapack conventions. I’m more confident that the increment of 1 is correct for the second call.

Indeed.

It’s important to note that this technique must not be used with an output argument !

Yes it is, as the actual argument is a non-contiguous array slice, which is copied to a contiguous array because of the use of an implicit interface that assumes contiguous dummy arguments)