LightKrylov (v0.1.0-beta) : Pre-release

loiseaujc · October 29, 2024, 1:58pm

Hej everyone!

I’m happy to announce that we’ve just officially released LightKrylov v0.1.0-beta! The code can be found here and the associated documentation here. It is a joint effort with two postdoctoral researchers in my lab (Simon Kern and Ricardo Frantz) and is funded partially thanks to an Agence Nationale pour la Recherche project (the French equivalent of the NSF basically).

I had already discussed quickly about LightKrylov on this topic but for those of you who have no idea what it is, here is the too long/didn’t read summary:

LightKrylov is a lightweight implementation of standard Krylov-based techniques relying on modern Fortran features, in particular its abstract type capabilities. Having no dependencies other than stdlib, it exposes abstract vector and abstract linop types which you can easily extend and adapt to the particular data structure and/or parallelization paradigm you use. Having extending these two types to accomodate your particular applications, LightKrylov lets you solve linear systems, compute leading eigenvalues or singular values of your linear operator fairly easily using standard Krylov techniques such gmres, arnoldi or lanczos.

The code is still a bit rough around the edges and we would be more than happy if any of you could give it a test drive and let us know any issue you would encounter or any improvement you’d like to see happen. We have a couple of examples at the moment, including computing the leading eigenpairs of the linearized Ginzburg-Landau operator, or using a Newton-Krylov algorithm to compute an unstable periodic orbit of the Roessler system in the chaotic regime and the associated Lyapunov vectors. We plan to add new examples shortly, in particular how to solve in parallel a Helmholtz equation (the Hello World of PDE) using the conjugate gradient method (with the preconditioned version being added quite soon).

Note that we have already started to incorporate LightKrylov into neklab, a bifurcation and linear stability analysis toolbox we’re writing for the massively parallel spectral element solver Nek5000. Preliminary results on a slightly older version indicated that LightKrylov is competitive in terms of computational performances with Arpack for computing the leading eigenvalues of the linearized Navier-Stokes operator for the incompressible two-dimensional cylinder flow (roughly a million degrees of freedom on a dozen cores) while requiring far fewer lines of code and a much easier integration into the existing code base. LightKrylov is also the base package upon which we are building LightROM, another package that will eventually makes its way into neklab to solve really high-dimensional Lyapunov and Riccati equations for linear optimal feedback control or estimation (i.e. LQR and LQG) using Dynamical Low-Rank Approximation. We’ll make a dedicated annoucement once LightROM has reached a sufficient level of maturity though.

Finally, the low-level routines of LightKrylov rely quite extensively on stdlib and in particular stdlib_linalg and stdlib_linalg_lapack. As such, I’d like to personally thank @FedericoPerini, @hkvzjal and @jeremie.vandenplas for their amazing work on this!

FedericoPerini · October 29, 2024, 2:35pm

This is amazing @loiseaujc, shout out to you and your lab team!

I have a personal interest in eigenvalues for large sparse problems (for chemical explosive mode analysis), in combustion it’s a very hard problem for packages like ARPACK because of the stiffness: the explosive modes typically have O(1) real components, while the most damping modes have magnitudes >O(10^15) in the negative real range, so ARPACK can fail capturing the positive real modes because they are only found when the Krylov subspace is now almost full. On the other hand, the dense approach is painstakingly slow. So as soon as I have some time, I’ll take a look to see if LightKrylov can have applications there.

On another note, shout out also to @jeremie.vandenplas and @hkvzjal for the linear algebra capability, it’s amazing to see that it’s already being effectively deployed.
Please do not hesitate to contribute to it by reporting bugs, requesting features, etc.

loiseaujc · October 29, 2024, 3:25pm

Sure! Sorry about letting you down with the norm / opnorm stuff. Summer has been a bit overwhelming with the kids and I had a handful of new classes at the begining of this semester I had to prepare which prevented me from doing any serious work outside of LightKrylov essentially. We do have a handful of functions which could eventually make their ways into stdlib_linalg though. These include:

Matrix exponential using Pade approximation similar to expm in SciPy.
Schur factorization based on gees.
Re-ordered Schur factorization (i.e. ordschur in Matlab).

I’m also working on a side project simultaneously: SpecialMatrices. It’ll essentially be very similar to SpecialMatrices.jl and ToeplitzMatrices.jl, i.e. providing special types and computational routines which can be used as drop-in replacement for stdlib_linalg when working with Diagonal, Bidiagonal, Tridiagonal, SymTridiagonal, Circulant, Toeplitz or Hankel matrices. I basically plan to overload matmul, solve, eig / eigh, and a couple other things for each of these types to make it very easy to use.

This is a fairly typical problem actually. Have you tried using a shift-invert strategy (albeit it does need a good preconditioned iterative solver)? Another alternative, which is typically what we use for Navier-Stokes is to use a time-stepping approach. Say the matrix you are interested in is A. You can associate to it a linear-time invariant dynamical system \dot{x} = Ax. Its solution reads x(\tau) = \exp(\tau A) x_0. The eigenvalues \mu_i of the exponential propagator are related to those of A via the exp-transform \mu_i = \exp(\tau \lambda_i) (and vice versa \lambda_i = \log(\mu_i) / \tau). Hence, eigenvalues corresponding to very stable modes get maps to zero, while those closer to the imaginary axis get maps close to the unit circle. If you then use arnoldi on the exponential propagator rather than A iteself, the first eigenvalues to converge are actually the ones you’re looking for. Note that you never need to form \exp(\tau A), you simply need its action onto a vector. This can easily be computed using your favorite time-integration scheme and this where iterative eigenvalue solvers shine and what you would typically encapsulate in the extended abstract linop in LightKrylov. How many Krylov vectors need to be generated to converge the eigenvalues obviously depends on \tau which itself impact how time-consuming computing \exp(\tau A)x is, but a rough estimate is to choose \tau compatible with the dominant time-scale (either the damping time or the period if it is oscillatory) of the leading eigenvalues you want to converge. If you look at the Ginzburg-Landau example in the LightKrylov, this is actually how we’ve done it. Likewise, this time-stepper appraoch is our go-to approach when dealing with the linearized Navier-Stokes operator. We have written a review paper on this here. We can setup a call sometime if you want to know more.

Abhi_dev · October 29, 2024, 3:42pm

Is there a way to track the progress of stdlib_linalg module. What are the features that are implemented, what are currently in progress and the roadmap so that we can contribute ?

rsci · October 29, 2024, 5:43pm

I took a brief look at the code, and it looks great!

Regarding massively parallel computing, are there any plans to implement communication-avoiding variants of the CG or gmres solvers?

FedericoPerini · October 30, 2024, 9:01am

Wow, thanks a lot @loiseaujc, this is great insight, thank you! It’s a very interesting idea, I had been using EXPOKIT in the past for this purpose, but only for dense matrices. It seems like there are also routines for sparse matrices using spMv, I’ll definitely dive deeper into LightKrylov when I have more time.

BTW I appreciate that by integrating stdlib into your work, you haven’t found huge issues yet Also: @hkvzjal and I have been working on the matrix norms and I should open a PR for that shortly. The Schur decomposition would definitely be another great one for the immediate/short term.

Best of luck with LightKrylov, it’s great to see math abstractions coming to life in Modern Fortran, and thanks again for sharing your insights!

FedericoPerini · October 30, 2024, 9:02am

Welcome to the forum @Abhi_dev, thanks for the interest! I would think that the best way to open a discussion for further features is to open an issue on the Github page, or if you have a clean implementation, you could open a PR directly.

loiseaujc · October 30, 2024, 11:30am

My pleasure! In general, if you want to solve Ax = \lambda x using Krylov-based techniques but the eigenvalues of interest are not on the outer rim of the spectrum, you can then replace it with f(A)x = f(\lambda) x where f is an invertible function which maps the eigenvalues of interest to the outer rim of the eigenspectrum. These will then be the first to converge. The critical point however is to find a function f which can easily be evaluated.

If you have a really good preconditioner for A, defining f(\lambda) = \dfrac{1}{\lambda - \sigma} where \sigma \in \mathbb{C} is your shift is probably the best option. It corresponds to the standard shift-invert strategy. It has some limitations though: not only do you need a good preconditioner, but your code also need to handle complex arithmetic. Additionally, if you change A you may have to completely rethink your preconditioning strategy. It definitely is worth the time if you can, but it’ll come at the price of extended code development time.
Using f(\lambda) = \exp(\tau \lambda) implies that you actually work with the exponential propagator \exp(\tau A). As I’ve said, you never want to actually compute this matrix (which will be dense even if A is sparse and requires \mathcal{O}(n^3) operations anyway). This is where time-integration schemes come into play to approximate it.
- Say you use a simple first order explicit Euler scheme with constant time step \Delta t = \tau/n. Then, you’re actually approximating the exponential with the following polynomial function f(\lambda) \simeq \hat{f}(\lambda) = (1 + \delta t \lambda)^n.
- If you use a first order implicit Euler scheme, then \hat{f}(\lambda) = \dfrac{1}{(1 - \delta t \lambda)^n}.
- If you use a semi-implicit scheme, then you approximate f with the rational function \hat{f}(\lambda) = \dfrac{(1 + \frac{\delta t}{2} \lambda)^n}{(1 - \frac{\delta t}{2} \lambda)^n}.
- If you use an adaptive Runge-Kutta scheme, you get yet another polynomial approximation of the exponential, etc.

Another alternative are indeed Krylov-based exponential integrators such as the ones in EXPOKIT. If my memory serves me right, it has been proven that such Krylov-based integrators are actually near-optimal in the sense that, for a polynomial approximation and a given accuracy, they actually minimize the number of matrix-vector products you need to do to evaluate \exp(\tau A) x. Note however that the implementation in LightKrylov is still pretty experimental and nowhere as feature-complete as EXPOKIT. We do have plans later on to actually incorporate time-step adaptive capabilities balancing how expensive the matrix-vector product is with how good the Krylov approximation is. I’m not that familiar with EXPOKIT but I wouldn’t be surprised if they already had such features implemented.

In any case, at the end of the day, the choice of the time-integration scheme is not the primal factor from my point of view. The major benefit is that, if you already have a (linearized) simulation code where you spent some time optimizing the time-integration, then using this time-stepping approach to do linear stability analysis requires actually very little modification to the code base. These ideas have been developed since the 1980’s by Laurette Tuckerman and others, and the time-stepper approach to linear stability has become the go-to in the hydrodynamic stability community. It was actually the main theme of my own PhD thesis.

loiseaujc · October 30, 2024, 12:09pm

At the moment we do not have such plans, even though it is somewhere in the back of our minds. Our main goal for now is to make sure that the standard algorithms in LightKrylov work as expected, can easily be incorporated into an existing code-base with minimal effort and have great baseline performances.

I would also have to look more closely at these algorithms and see exactly how they can be implemented using the abstract type route we’ve decided to take. Note however that for solving Ax = b with A symmetric positive definite, we are currently working with a colleague on a high-performing implementation of the Chebyshev-accelerated Jacobi (and block-Jacobi) method, in particular for the finite-difference approximation of the Laplace operator in 2 and 3 dimensions. While the convergence rate might be slightly less good than conjugate gradient, the main benefit of the Chebyshev-Jacobi method is that it does not require inner product whatsoever which actually is the main bottleneck for parallel CG as it requires a communication which synchronizes the processes. You do need halo exchanges though, but these are also present in CG when evaluating Ax but can be done with non-blocking communications. What we expect is that, despite the slightly less good convergence rate, the wall-clock time-to-solution of the Chebyshev-Jacobi be better than CG precisely because of the absence of synchronizing inner products. I’ll keep everyone updated once we have the code ready.

rsci · October 30, 2024, 2:38pm

In our atmospheric model code, we have already conducted some preliminary comparisons between CG and Chebyshev methods. The asymptotic convergence of both is similar, but Chebyshev iterations scale much better. However, Chebyshev methods require accurate estimates of the upper bound of the spectrum; even a small underestimation can lead to a noticeable drop in the convergence rate or even divergence of the method.

We also have a geometric multigrid method that demonstrates excellent convergence, but its parallel efficiency is limited by the scalability of the coarse levels, moreover constructing an effective smoother implicitly requires some knowelgde about the matrix spectrum (one needs to optimize convergence for the [\lambda_s, \lambda_{max}] part of the spectrum, where \lambda_s is the boundary between slowly and fast-varying eigenvectors). Therefore, I am considering trying communication-avoiding Krylov methods, which, in theory, allow for an arbitrary number k of iterations of the method within a single global communication. In practice, it seems that using large values of k leads to numerical stability issues (since one needs to operate on A^k v vectors without orthogonalization), so some additional care is needed.

It’s great that this forum allows us to discuss not only Fortran but also relevant scientific issues:) Good luck with your project, and I look forward to hearing about your progress.

hkvzjal · October 30, 2024, 3:20pm

Nice work @loiseaujc I’ll definitely take a close look!

Indeed, my understanding is also that trying to “asynchronize” methods with a strong dependency on an accurate computation of the inner-product (CG or gmres) is not a good idea as convergency can not be guaranteed.

If this a domain of interest, you might want to follow the works of Prof. Frédéric Magoulès https://www.researchgate.net/profile/Frederic-Magoules/publications who has extensively worked on robust asynchronous solvers for which convergence can be demonstrated and ensured at user-defined tolerance Google Scholar.

loiseaujc · October 30, 2024, 4:32pm

Cool! Glad to know my understanding is corroborated numerically!

That is indeed a major limitation compared to CG. It is also one of the reasons we’re focusing on the Laplace operator on standard domains (very academic admittedly): the spectral radius for finite-difference approximations on hyper-cubes can be computed analytically. For more complicated domains, we actually have simple pre-processing technique where we run the standard Jacobi method for a number of steps and infer its spectral radius from the convergence history and use that as the upper bound. This simple heuristic works suprisingly well, albeit you do need to have this pre-processing step. Doing so make sense if you need to solve Ax = b for multiple right-hand side vectors as the divergence-free projection step in an incompressible Navier-Stokes simulation. If A changes at every iteration, this is no longer practical.

Yep. As far as I know, the main problem of CA-GMRES is the loss of orthogonality of the Krylov vectors. An alternative we’re considering to implement in LightKrylov though is the use polynomial-Arnoldi preconditioners. You’re still using a high-degree polynomial but maintain at the same time the orthogonality of the Krylov basis. You can look at some papers by Mark Embree if you want to know more. This is definitely at the top of our list of things we’d like to include in LightKrylov.

(edit) PS: I will actually take some time tomorrow after I’ve discussed with Simon to come up with a list of the top features we’d like to include in LightKrylov, just to give ideas if any one wants to give it a go.

CRquantum · November 1, 2024, 11:15pm

If you use EXPOKIT, note that there are some small bugs in some of its code which can cause integer overflow and lead to wrong results.
For example, in dgpadm.f, around line 87

Originally it is

scale = t / DBLE(2**ns)

that 2**ns will cause integer overflow if ns>30 if the default integer type is integer 4, then the DBLE(2**ns) will be something like NAN. You need to change DBLE(2**ns) to something like 2.0d0**ns
Without this fix, every time if ns>30, dgpadm will just output wrong results like NAN or something.

Basically any DBLE() in those code in EXPOKT need to be carefully checked to prevent integer overflow issues.

FedericoPerini · November 2, 2024, 11:24am

Great catch @CRquantum, thanks! Do you know if the package is still being maintained somehow? It would be nice to use a modernized (and possibly bugfixed) version

CRquantum · November 3, 2024, 2:29am

Thanks @FedericoPerini
Perhaps it is best to directly contact the EXPOKIT author Roger B. Sidje?

PS.
Besides expokit, I found some latest matrix expontial solvers seems working good too, such as those by Jorge Sastre,

Usually in his paper he provide his solver’s matlab (can be easily tranlated into Fortran) or Fortran version.

loiseaujc · February 17, 2025, 1:09pm

Quick update : I’ve just implemented the polynomial arnoldi preconditioner for GMRES to try it out on the 2D Poisson equation. Here is the repo. Although some bits and pieces are still a bit rough around the edges, it works fantastically well. Not only is the number of matrix-vector products reduced compared to the unpreconditioned one, but the number of dot products that need to be computed is even further decreased! Haven’t had the chance to try it out in parallel, but I’m pretty sure it’ll have pretty good scaling. I’ll port it to LightKrylov shortly.

hkvzjal · February 18, 2025, 8:19am

Interesting, just out of curiosity, will you compare against a Jacobi and/or Incomplete Cholesky preconditioner?

rsci · February 18, 2025, 9:22am

Nice work! Just to clarify, are you using the GMRES polynomial as a preconditioner for restarted GMRES? If there’s any reference for the approach you implemented, could you share a link? I’d like to take a closer look!

loiseaujc · February 21, 2025, 10:22am

The nice thing about this GMRES polynomial preconditioner is that, if you already use another preconditioner (say block-Jacobi or incomplete Cholesky for instance), you can actually use B = A M^{-1} to further improve the already preconditioned system.

Unfortunately, this is a pet project for me so I can’t dedicate much time. I do plan however to compare the performances against block-Jacobi preconditioner, Chebyshev-Jacobi acceleration, and possibly a simple geometric multigrid solver if I have time to implement it. The Bartels-Stewart algorithm for solving AX + XB = C is also on the table as you can write the 2D Poisson problem on the unit square into this Sylvester equation form. All of these will only be serial computations though, so whenever I post the result, you’ll need to take it with a grain of salt.

loiseaujc · February 21, 2025, 10:33am

Indeed, the preconditioner is built from the GMRES polynomial. Provided you have a routine to compute the k-step Arnoldi factorization, you’re good to go. Here are a couple of references:

M. Embree, J. A. Loe, and R. Morgan. Polynomial preconditioned Arnoldi with stability control, SIAM Journal on Scientific Computing, vol. 43, 2021 [link].
J. Loe, and R. Morgan. Toward efficient polynomial preconditioning for GMRES, Numerical linear algebra with applications, vol. 28, 2022 [arxiv]
M. Embree, J. Henningsen, J. Jackson and R. Morgan. Polynomial Approximation to the inverse of a large matrix [PDF]
J. Henngingsen. A polynomial approximation to the inverse of a matrix. [PhD thesis]

The code in the repo is mostly based on Loe & Morgan (second reference). I’m in the process of reading the last two references.

Topic		Replies	Views
Generic or Kind-Agnostic linear system solvers Help	34	1639	March 4, 2025
stdlib_LinAlgOperators: Linear Algebra Operators	14	250	March 4, 2025
Specialized matrix decomposition Help	16	685	May 18, 2024
Optimal way to code getting spectra of matrices?	9	3048	September 1, 2021
Linear algebra support in stdlib stdlib	27	1664	March 27, 2025

LightKrylov (v0.1.0-beta) : Pre-release

Related topics