In test1, I compared the results of the pinv function in MATLAB with my implementation. The results match when using the Intel compilers (ifort (IFORT) 2021.8.0 20221119 and ifx (IFORT) 2023.0.0 20221201), but there are differences when using gfortran (GNU Fortran (GCC) 13.1.0). Do you have any suggestions to address this issue?

In test2, I would like to test the performance of pinv for large matrices, such as A(100000,10000). Are there any suggestions you can provide to improve the performance of the pseudo-inverse for such large matrices?

That is going to be a very expensive O(n^3) SVD operation. As with regular inverse-matrix computations, you should look for the possibility that you don’t really need the inverse, but rather you can settle for something else, such as the solution of linear equations. If you are solving a least-squares problem, for example, then there may be other more efficient ways to compute rank-deficient solutions (e.g. iterative methods that scale as O(n^2) per iteration).

[edit] I misread the matrix dimensions when I posted this originally. I thought you had a square matrix. For a rectangular matrix of dimension A(m,n), the SVD will scale as m*n^2. An iterative method would scale as m*n per iteration.

I would suggest that you start with that, and then decide how it scales and whether there are other approaches that are better. DGELS uses QR factorization rather than SVD, so it will be cheaper for that reason, but DGELS will still scale as m*n^2, which will be costly for your large matrix dimensions. If you could find an iterative method (like a preconditioned-GMRES) that converges in, say 100 iterations, then the effort would scale as 100*m*n, but finding such a method, if it exists, is problem-specific. You might also benefit from parallelization, so you should explore other linear algebra packages such as: ScaLAPACK — Scalable Linear Algebra PACKage. If your matrix is sparse, meaning the number of nonzero elements is <<m*n, then that feature might also help, especially within an iterative solution method. In any case, when you start to scale up the matrix dimensions with DGELS, you will see then what kind of problems you face.

Use optimized LAPACK/BLAS routines (if you aren’t already). For example: OpenBLAS, AtlasBLAS, GotoBLAS or Intel MKL. Could give 2-10 X speedup over Fortran BLAS.

Use a parallelized BLAS/LAPACK library. Speedup will depend on problem, and may be between 0.5 and NUMBER_OF_CORES at a guess

Exploit any structure in the input matrix: sparsity; blocks - in particular large zero, identity, diagonal and structured blocks.

(1) and (2) are relatively easy - usually just install the appropriate libraries for your system, change the linker flags and perhaps set some environment variables. (3) requires some work.

I have also used pseudo-inverse solutions, I am not an expert in linear algebra, just a user, and out of curiosity I added my version of pseudo-inverse solutions under my fork and tested the performance difference at:

(Because I am using Windows-MSYS2 environment, M_stopwatch is not valid in that environment)

> fpm test --compiler gfortran --flag "-Wno-line-truncation -llapack -lblas" test2 --profile release
Elapsed CPU time: 1.32812500
Elapsed CPU time: 9.40625000
test2 completed.
> ifort with Visual-Studio (release mode)
Elapsed CPU time: 0.5937500
Elapsed CPU time: 3.531250
test2 completed.

It is faster to call LAPACK’s API directly when possible. (Using -Ofast can be unexpected, so it is not used here)

@RonShepard, thank you. The matrices are sparse.
I’m going to start with DGELS as you mentioned. Then maybe I will consider using ScaLAPACK, although I have not been successful in using it before.

If your matrices are sparse you can also consider ARPACK. You can try it from SciPy in python to see if it fits your needs svds(solver=’arpack’) — SciPy v1.11.4 Manual. If it does, there are some discussions and sources for ARPACK here Modernizing Arpack

@zoziha, thank you for your comparison.
The pseudo-inverse without SVD (full rank) is obviously faster, but I think using SVD provides more accurate solutions. Here are the relative errors:

using pinv: 6.273484339772714E-015
using linv: 4.771404614080608E-014

In some cases, SVD also allows for the use of low rank, which can lead to faster computations (I guess). I will test this solution.

I have noticed that using a larger lwork parameter in the dgesvd function yields slightly better performance.

dgesvd('A', 'A', m, n, A, m, S, U, m, VT, n, work, lwork, info)

Here are the results:
With lwork = huge(lwork): cpu= 25.03 user= 24.55 sys= 0.48 wall= 2.89

And with a smaller lwork: 'cpu= 28.70 user= 28.24 sys= 0.46 wall= 3.34`

do irank = 1, min(n, m)
do j = 1, m
if (U(j, irank) == 0.) cycle
do i = 1, n
Apinv(i, j) = Apinv(i, j) + VT(irank, i) * U(j, irank) / S(irank)
end do
end do
end do

The SVD of a sparse matrix does not necessarily result in zero left singular vectors.

This is kind of clunky, but it is flexible. It allows, for example, a single workspace allocation to be reused for several solutions, so it minimizes the allocation/deallocation overhead.

[edit] I just noticed that you were talking about DGESVD() rather than DGELS(), but the same memory allocation approach works in either case.

In any case, I would suggest using the LAPACK routines to get a feeling for how the problems scale and what kinds of accuracies to expect with various singular value thresholds, and then explore the various sparse matrix options with that reference. So even if you end up not using the LAPACK code, those initial experiments are not wasted effort.