Block size issues with ScaLAPACK to solve `A c = b` with QR decomposition

Hi,

I am working on a code to solve a distributed linear system A c = b where A is a tall and skinny matrix, and A and b are distributed among multiple MPI processes (e.g. 1e6 vs 1e4). This is done via ScaLAPACK using the subroutines pdgeqrf, pdormqr, and pdtrtrs.

pdtrtrs seems to require that the block sizes mb and nb of A must equal. And if they are not, the program stops with an error message claiming the global number of columns is invalid. I have opened an issue in the Github reference repo but didn’t get a reply.

Does someone here know why this restriction exists? And why it prints an apparently unrelated error message?

Are there recommendations how to choose good blocking size (processor grid (p,1))?