Writing wrappers for LAPACK and BLAS routines

How about using Parallel BLAS?

! Standard BLAS
call  zgemv(trans, m, n, alpha, a, lda, &
                                x, incx, &
                          beta, y, incy)

! Parallel BLAS
call pzgemv(trans, m, n, alpha, a, ia, ja, desca, & 
                                x, ix, jx, descx, incx, &
                          beta, y, iy, jy, descy, incy)

You will need to establish the distributed arrays and descriptors at first, but otherwise your code can remain mostly the same.

More info: