This is likely a symptom of another program error somewhere, maybe not even related to this function reference. The array metadata for that argument is being overwritten and corrupted. It might be a dangling pointer, or an array access overrun, or something similar.
I’m guessing the error is in the sparse matrix indexes resulting in an out-of-bounds read of the array?
If values in col_ind are wrong, there could be an out-of-bounds read in g_current. If the row_ind are wrong, there could be an out-of-bounds read in col_ind or values.
Here is how a check of the indexes for the CSR storage scheme (with one-based indexing) might look like:
program check
implicit none
integer, parameter :: n = 4
integer, parameter :: nnz = 8
integer :: ia(n+1), ja(nnz)
! row index pointers
ia = [1,4,6,8,9]
! column indexes
ja = [ 1, 2, 4, &
2, 3, &
1, 4, &
2 ]
call csr_check(n,nnz,ia,ja)
contains
! Assumes square matrix of size n-by-n, and
! one based indexing of the sparsity pattern
subroutine csr_check(n,nnz,ia,ja)
integer, intent(in) :: n, nnz
integer, intent(in) :: ia(n+1), ja(nnz)
logical :: consistent
integer :: i, nz, iaa, iab
nz = 0
! Check consistency of row indexes
do i = 1, n
consistent = ia(i) < ia(i+1)
if (.not. consistent) then
write(*,*) "csr_check: error in row pointer ", i
error stop 1
end if
nz = nz + (ia(i+1) - ia(i))
end do
! Check number of non-zeros is consistent with the row pointers
if (nnz /= nz) error stop 2
if (nnz /= ia(n+1)-1) error stop 3
! Check column indexes are within the matrix
do i = 1, n
iaa = ia(i)
iab = ia(i+1)-1
if (any(ja(iaa:iab) > n)) then
write(*,*) "csr_check: error in column index ", i
error stop 4
end if
end do
end program
Edit: I made some changes to the csr_check procedure
Thanks for providing the check! However, the check returns with no errors, but I still got the access violation when running without the debugging flags.
In both cases, the command mkl_sparse_d_mv returns no (immediate) error (I checked istat), and the rest of the code completes correctly (I compared results with a version without using the sparse BLAS library and they are identical).
The error is with deallocating inflow_couple upon exiting the routine. I’m really puzzled by this error.
Did you try running with lower optimization levels like O1 or O2. The fact that it runs with O0 and not O3 indicates to me an optimization problem. Unfortunately, with ifort (I presume thats what you are using) I will occasionally run into a problem with using O3 (even with codes that compiled correctly with O3 in an older version of the compiler). The weirdest one I’ve encounterd is a code that didn’t produce the correct results with O0 but did with O3, exactly opposite of what you would expect.
These kinds of program errors are often “Heisenbugs”. You can make trivial changes to the compiler options, or add write statements for debugging, etc. and the error will magically disappear and reappear. They can be difficult to locate and fix.
Particularly when its the compiler causing the problem and not the programmer. I’ve spent a couple of days rewritting code trying to kill a bug assuming it was something I did when it was a compiler bug or an optimization problem (which I consider a compiler bug).
Thanks for your answers! I think in this case it was the programmer (i.e. me) creating the problem
The size of the transition matrix is very big, and the problem was the declaration of max_nz and counter. Before I declared them as integer, now I declare them as integer(8) and the problem disappeared.
When I was declaring max_nz as integer, I was losing some non zero elements