Hi,
I am working on a code to solve a distributed linear system A c = b
where A
is a tall and skinny matrix, and A
and b
are distributed among multiple MPI processes (e.g. 1e6 vs 1e4
). This is done via ScaLAPACK using the subroutines pdgeqrf
, pdormqr
, and pdtrtrs
.
pdtrtrs
seems to require that the block sizes mb
and nb
of A
must equal. And if they are not, the program stops with an error message claiming the global number of columns is invalid. I have opened an issue in the Github reference repo but didn’t get a reply.
Does someone here know why this restriction exists? And why it prints an apparently unrelated error message?
Are there recommendations how to choose good blocking size (processor grid (p,1)
)?