I am working on a code to solve a distributed linear system
A c = b where
A is a tall and skinny matrix, and
b are distributed among multiple MPI processes (e.g.
1e6 vs 1e4). This is done via ScaLAPACK using the subroutines
pdtrtrs seems to require that the block sizes
A must equal. And if they are not, the program stops with an error message claiming the global number of columns is invalid. I have opened an issue in the Github reference repo but didn’t get a reply.
Does someone here know why this restriction exists? And why it prints an apparently unrelated error message?
Are there recommendations how to choose good blocking size (processor grid