According to qr_mumps / qr_mumps · GitLab, StarPU is used for parallelization. I’m pretty sure that there is no MPI parallelization and setting OMP_NUM_THREADS=1
still gives 0.2 sec.
To me, it seems plausible: Choosing an appropriate algorithm gives normally the larger gain in comparison to parallelization and for sub-second runtimes, overhead and serial sections often prevent any parallel speedup.