Dear all, I’m here to propose to you the zero-th iteration of the iterative method suite. First, let me thank @hkvzjal, @ivanpribec, @loiseaujc and @FedericoPerini for their replies to my questions and their suggestions. Moreover, I want you to pose yourself the question “how much fypp is too much fypp?” and give me your feedback, as you will see I might have gone a bit too far. Of course, everything in this proposal is open to changes, from naming to code structure, according to your suggestions, so please do not refrain from criticism.
The zero-th iteration only contains a Preconditioned Conjugate Gradient, as I am mostly studying the structure of the library rather than the actual methods. Also, sorry for the wall of text.
API proposal
The interface wants to be simple but customizable. The simplest case, for the user, is just
call stdlib_linalg_cg( A, rhs, x)
within a program. This call solves the linear system A\mathbf{x}=\mathbf{b} with a tolerance tol = 10^{-6}, a maximum number of 1000 iterations and does not print any statement out. Eventually, the call
call stdlib_linalg_cg( A, rhs, x, info)
provides also information through the integer info
about the successful run of the solver (info=0
) or the unsuccessful run (info=-1
). Building up from this, one can change the default options with the
call set_cg_opts(itmax=15)
call stdlib_linalg_cg( A, rhs, x, info) ! Runs maximum 15 iterations
and modify these parameters for all the successive runs, as the container of the default options (cg_defopts
) is a global, public variable. A call with the option restore=.true.
restores the default values for the successive runs
call set_cg_opts(restore=.true.)
call stdlib_linalg_cg( A, rhs, x, info) ! Runs with original options
Clearly, one might want to have multiple set of options to run a CG under different conditions (e.g. to solve a linear system with good accuracy and to use it as a preconditioner for a Rayleigh Quotient minimization, with low accuracy or small maximum iteration number). This can be achieved by defining an option type
!! Option type
type(cg_options) :: cg_solver, cg_precond
[…]
call set_cg_opts(cg_solver, tol=10e-10)
call set_cg_opts(cg_precond, tol=10e-4)
[…]
call stdlib_linalg_cg( A, rhs, x, cg_solver)
call stdlib_linalg_cg( A, rhs, x, cg_precond)
Finally, one might need to specify an initial guess for the solver. This can be done with
call stdlib_linalg_cg( A, rhs, x, xzero = .true. )
which tells the solver to compute the initial residual as \mathbf{r}=\mathbf{b} - A\mathbf{x}, where \mathbf{x} is the actual passed argument of the call. Clearly,
call stdlib_linalg_cg( A, rhs, x, xzero = .false. )
computes the initial residual as \mathbf{r}=\mathbf{b} and allows bookkeeping of what one is doing if needed. Moving on to the next big one.
Preconditioning
IMHO, there is no way of doing versatile preconditioning without forcing the user to use an abstract type. The preconditioner type is defined as
!!
!! real(wp) preconditioner
!!
type, abstract :: wp_preconditioner
!! Abstract type defining the empty shell.
contains
!! Procedure for the application of the preconditioner to a vector.
procedure(wp_apply), deferred :: apply
!!
end type wp_preconditioner
abstract interface
!! Abstract procedure defining the interface for a
!! general application action
subroutine wp_apply(this,vec_in,vec_out,ierr)
import wp
import wp_preconditioner
implicit none
!! The class with all quantities describing the operator
class(wp_preconditioner), intent(in ) :: this
!! The input" vector
real(wp), intent(in ) :: vec_in(:)
!! The output vector y= "operator apllied on" x
real(wp), intent(inout) :: vec_out(:)
!! An integer flag to inform about error
!! It is zero if no error occured
integer, intent(inout) :: ierr
end subroutine wp_apply
!!
!!
end interface
Please, interpret wp
as any kind of precision is needed, this will be fypped later. This is unfortunately needed because if we want to defer apply
we need to specify the interface and thus the kind of the input/output arrays. As a trivial example, one could define a Jacobi preconditioner
module Jacobi_prec
use stdlib_kinds
use stdlib_linalg_preconditioners
private
!! Extension of dp_preconditioner to define a Jacobi one
type, public, extends(dp_preconditioner) :: dp_Jacobi
!
real(dp), allocatable :: Diag(:)
integer :: nrows
!
contains
!
procedure, public, pass :: init => init_Prec
!! initialize procedure
procedure, public, pass :: apply => apply_Prec
!! apply procedure, overridden
!
end type dp_Jacobi
!!
contains
!! constructor
subroutine init_Prec(this, nrows, matIN)
implicit none
class(dp_Jacobi), intent(inout) :: this
!
integer, intent(in ) :: nrows
real(dp), intent(in ) :: matIN(:,:)
integer :: i
!
this%nrows = nrows
allocate(this%Diag(nrows))
do i = 1, this%nrows
this%Diag(i) = matIN(i,i)
end do
end subroutine init_Prec
!!
!! apply tridiagonal operator
subroutine apply_Prec(this,vec_in,vec_out,ierr)
implicit none
class(dp_Jacobi), intent(in ) :: this
real(dp), intent(in ) :: vec_in(:)
real(dp), intent(inout) :: vec_out(:)
integer, intent(inout) :: ierr
integer :: i
!
do i = 1, this%nrows
vec_out(i) = vec_in(i) / this%Diag(i)
end do
ierr = 0
end subroutine apply_Prec
!!
end module Jacobi_prec
The interesting feature of the “empty shell” approach is that it only depends on the kind of vector we apply the preconditioner on to, through the apply routine. This means that with the same shell we already cover intrinsic matrices and the sparse matrix class under development. Any extension of this template to other vectors (e.g. abstract vectors, as in Enrico Facca’s proposal) is a matter of fypp (and actual implementation, clearly). Notice that the init
routine cannot be made abstract because it will be impossible to specify a priori the input, so the user should take its time to build his preconditioner (which is not stupid) while some basic preconditioners could be already provided, e.g. Jacobi, SSOR, ICHOL and ILU, as a further expansion of the library.
The user developed method, or the library provided method will be accessible in the code with
type(dp_Jacobi) :: Jacobi
[…]
call Jacobi%init(n, A)
[…]
call stdlib_linalg_cg( A = A, K = Jacobi, rhs = rhs, x = x, info = info )
where K
is the preconditioner.
What’s under the hood
Let’s start from the options. The `cg_options` type is defined as
!! Definition: cg_options
type, public :: cg_options
!!
!! Maximum number of iterations
integer :: itmax = 1000
!!
!! Single and double precision tolerance
real(sp) :: tol_sp = 0.0000001_sp
real(dp) :: tol_dp = 0.0000001_dp
!!
!! Verbosity and writing unit
logical :: verb = .false.
integer :: out_unit = 6
!!
end type
And within the module there is the set_cg_options
subroutine that I do not post here because it’s just an awful set of conditionals. Its interface reads
subroutine set_cg_opts(opt, itmax, tol_sp, tol_dp, verb, out_unit, restore )
!!
type(cg_options), intent(inout), optional :: opt
integer, intent(in ), optional :: itmax
real(sp), intent(in ), optional :: tol_sp
real(dp), intent(in ), optional :: tol_dp
logical, intent(in ), optional :: verb
integer, intent(in ), optional :: out_unit
logical, intent(in ), optional :: restore
Here I have a request for help: It is clear that these set of values are common to all solvers, but some (e.g. restarted GMRES) have more options (say a restart number nrest
). It would be natural then to define a base class kr_options
and extend it when designing a method. However, I don’t understand if and how inheritance would work for methods of the class, if I can extend a method or not. Ironically, the design of the options container is what keeps me up at night and I can’t figure it out.
The actual solver
Here comes the fun. This is the implemented algorithm, with its interface and all, for the case of intrinsic data types. What I’m looking for here is some suggestion about code structuring and eventual checks to perform in the subroutine (not code check or algorithmic correctness check).
subroutine stdlib_krylov_intrinsic_dp_CG( A, rhs, x, info, xzero, opts_in )
!!
!! In-Out variables
real(dp), intent(in ) :: A(:,:)
!! Linear operator to be inverted.
real(dp), intent(in ) :: rhs(:)
!! Right-hand side vector.
real(dp), intent(inout) :: x(:)
!! Solution vector.
integer, intent( out) :: info
!! Information flag.
logical, intent(in ), optional :: xzero
!! provided initial solution flag
type(cg_options), intent(in ), optional :: opts_in
!! Option structure.
!!
!! Internal variables
type(cg_options) :: options
!! Default options
real(dp), allocatable :: r(:)
real(dp), allocatable :: p(:)
real(dp), allocatable :: Ap(:)
!! Residual, search direction and other vectors.
real(dp) :: alpha, beta
real(dp) :: r_dot_r_old, r_dot_r_new
!! Scalars used in the CG algorithm.
real(dp ) :: residual
!! Residual norm for exit criterions
integer :: itnum
!! Iteration counter
integer :: n
!! Problem size
n = size( A, 1 )
info = 0
if (present(opts_in)) then
options = opts_in
else
options = default_opts
end if
!! Options to be set
!!
! Allocate and initialize support vector
allocate (Ap, mold = rhs)
Ap = 0._dp
! Allocate residual and copy in rhs
! ( 2) Evaluate initial residual r_o = rhs ( - A*x_o later)
allocate (r, source = rhs)
! ( 1) Parse whether an initial solution is provided or not
if ( present(xzero) .and. xzero ) then
! It is provided
! ( 1) Keep x_o
! ( 2) Evaluate initial residual r_o = rhs - A*x_o
Ap = matmul( A, x )
r = r - Ap
else
! It is not provided
! ( 1) Set x to zero
x = 0._dp
! ( 2) r = b already performed in allocation
end if
! Allocate direction vector and copy-in p = r.
allocate (p, source = r )
! Initialize dot product of residual: r_dot_r_old = r' * r.
r_dot_r_old = dot_product( r, r )
! Safety computation to avoid a possible division by zero in r*r/p*Ap
if ( sqrt(r_dot_r_old) == 0._dp ) p = options%tol_dp
! info flag
if ( present(info) ) info = -1
! CG Iteration Loop.
cg_iterations: do itnum = 1, options%itmax
! ( 4) Compute A * p.
Ap = matmul( A, p )
! ( 4) Compute step size alpha = r_dot_r_old / (p' * Ap).
alpha = r_dot_r_old / dot_product( p, Ap )
! ( 5) Update solution x = x + alpha * p.
x = x + (alpha) * p
! ( 6) Update residual r = r - alpha * Ap.
r = r + (-alpha) * Ap
! ( 7) Compute new dot product of residual r_dot_r_new = r' * r.
r_dot_r_new = dot_product( r, r )
! ( 7) Compute norm of residual.
residual = sqrt(r_dot_r_new)
! ( 7) Check for convergence
if (options%verb) then
write (options%out_unit, *) "INFO : CG residual after ", (itnum), "iterations : ", residual
end if
if ( residual < options%tol_dp ) then
if (options%verb) then
write (options%out_unit, *) "INFO : CG Converged: residual ", residual, "< tolerance: ", options%tol_dp
end if
if ( present(info) ) info = 0
exit cg_iterations
end if
! ( 8) Compute new direction beta = r_dot_r_new / r_dot_r_old.
beta = r_dot_r_new/r_dot_r_old
! ( 9) Update direction p = r + beta * p.
p = beta * p + 1._dp * r
! (10) Update r_dot_r_old for next iteration.
r_dot_r_old = r_dot_r_new
end do cg_iterations
deallocate (r, p, Ap)
return
end subroutine stdlib_krylov_intrinsic_dp_CG
You might wonder: what kind of data types are supported? If we look at the complete declaration of variables of this routine, we have a matrix, a vector, a scalar, and implicitly a precision. These are the things that must be prescribed if we want to extend this algorithm to some other data type. So I propose the following approach: define the algorithm for tabular, intrinsic arrays, define it for the forthcoming sparse class, and define it for abstract Operators-Vectors-Scalars (I sense that abstract scalars will be handy defining parallel reductions to define parameters such as alpha
which is a ratio of scalar products), which I will start developing and testing soon. With these three macro-kinds, we should be able to target a broad class of usages. In particular, what the library currently handles are the following types.
sp |
dp |
Matrix |
Vector |
Scalar |
real(sp) |
real(sp) |
real(sp) |
COO_sp |
real(sp) |
real(sp) |
CSR_sp |
real(sp) |
real(sp) |
CSC_sp |
real(sp) |
real(sp) |
ELL_sp |
real(sp) |
real(sp) |
sp_Operator |
real(sp) |
real(sp) |
sp_Operator |
sp_Vector |
real(sp) |
|
Matrix |
Vector |
Scalar |
real(dp) |
real(dp) |
real(dp) |
COO_dp |
real(dp) |
real(dp) |
CSR_dp |
real(dp) |
real(dp) |
CSC_dp |
real(dp) |
real(dp) |
ELL_dp |
real(dp) |
real(dp) |
dp_Operator |
real(dp) |
real(dp) |
dp_Operator |
dp_Vector |
real(dp) |
|
Code the algorithm once, scatter on all data types
Ok, but how do we code it for all these data types without losing our minds? Enters fypp. The actual implemented routine is the result of the early days of the developing, when I coded all the Conjugate "Things" kind of algorithm, just to find errors and to correct them multiple times, on multiple data structures and so on. So I came out with the following:
!!
!!
!! Conjugate Gradient Algorithm implementation:
!!
#:for tp in type_dict.values()
#:for i in range(len(tp["subnam"]))
!! ${tp["matrix"][i]}$-${tp["vector"][i]}$ classical version
!!
subroutine stdlib_krylov_${tp["subnam"][i]}$_CG( A, rhs, x, info, xzero, opts_in )
!!
!! In-Out variables
${tp["matrix"][i]}$, intent(in ) :: A${tp["matrnk"][0]}$
!! Linear operator to be inverted.
${tp["vector"][i]}$, intent(in ) :: rhs${tp["vecrnk"][0]}$
!! Right-hand side vector.
${tp["vector"][i]}$, intent(inout) :: x${tp["vecrnk"][0]}$
!! Solution vector.
integer, intent( out) :: info
!! Information flag.
logical, intent(in ), optional :: xzero
!! provided initial solution flag
type(cg_options), intent(in ), optional :: opts_in
!! Option structure.
!!
!! Internal variables
type(cg_options) :: options
!! Default options
${tp["vector"][i]}$, allocatable :: r${tp["vecrnk"][0]}$
${tp["vector"][i]}$, allocatable :: p${tp["vecrnk"][0]}$
${tp["vector"][i]}$, allocatable :: Ap${tp["vecrnk"][0]}$
!! Residual, search direction and other vectors.
${tp["scalar"][i]}$ :: alpha, beta
${tp["scalar"][i]}$ :: r_dot_r_old, r_dot_r_new
!! Scalars used in the CG algorithm.
real(${tp["wp"][i]}$ ) :: residual
!! Residual norm for exit criterions
integer :: itnum
!! Iteration counter
integer :: n
!! Problem size
n = ${ tp["rowsize"]("A") }$
info = 0
if (present(opts_in)) then
options = opts_in
else
options = default_opts
end if
!! Options to be set
!!
! Allocate and initialize support vector
allocate (Ap, mold = rhs)
${ tp["coef"]("Ap") }$ = 0._${tp["wp"][i]}$
! Allocate residual and copy in rhs
! ( 2) Evaluate initial residual r_o = rhs ( - A*x_o later)
allocate (r, source = rhs)
! ( 1) Parse whether an initial solution is provided or not
if ( present(xzero) .and. xzero ) then
! It is provided
! ( 1) Keep x_o
! ( 2) Evaluate initial residual r_o = rhs - A*x_o
${ tp["matvec"]("A", "x", "Ap", tp["wp"][i] ) }$
${ tp["coef"]("r") }$ = ${ tp["coef"]("r") }$ - ${ tp["coef"]("Ap") }$
else
! It is not provided
! ( 1) Set x to zero
${ tp["coef"]("x") }$ = 0._${tp["wp"][i]}$
! ( 2) r = b already performed in allocation
end if
! Allocate direction vector and copy-in p = r.
allocate (p, source = r )
! Initialize dot product of residual: r_dot_r_old = r' * r.
r_dot_r_old = ${ tp["dotprod"]( "r", "r", "n") }$
! Safety computation to avoid a possible division by zero in r*r/p*Ap
if ( sqrt(r_dot_r_old) == 0._${tp["wp"][i]}$ ) ${ tp["coef"]("p") }$ = options%tol_${tp["wp"][i]}$
! CG Iteration Loop.
cg_iterations: do itnum = 1, options%itmax
! ( 4) Compute A * p.
${ tp["matvec"]("A", "p", "Ap", tp["wp"][i] ) }$
! ( 4) Compute step size alpha = r_dot_r_old / (p' * Ap).
alpha = r_dot_r_old / ${ tp["dotprod"]( "p", "Ap", "n") }$
! ( 5) Update solution x = x + alpha * p.
${ tp["axpy"]("alpha", "p", "x", tp["wp"][i], "n" ) }$
! ( 6) Update residual r = r - alpha * Ap.
${ tp["axpy"]("-alpha", "Ap", "r", tp["wp"][i], "n" ) }$
! ( 7) Compute new dot product of residual r_dot_r_new = r' * r.
r_dot_r_new = ${ tp["dotprod"]( "r", "r", "n") }$
! ( 7) Compute norm of residual.
residual = sqrt(r_dot_r_new)
! ( 7) Check for convergence
if (options%verb) then
write (options%out_unit, *) "INFO : CG residual after ", (itnum), "iterations : ", residual
end if
if ( residual < options%tol_${tp["wp"][i]}$ ) then
if (options%verb) then
write (options%out_unit, *) "INFO : CG Converged: residual ", residual, "< tolerance: ", options%tol_${tp["wp"][i]}$
end if
exit cg_iterations
end if
! ( 8) Compute new direction beta = r_dot_r_new / r_dot_r_old.
beta = r_dot_r_new/r_dot_r_old
! ( 9) Update direction p = r + beta * p.
${ tp["axpby"]("1._"+tp["wp"][i], "r", "beta", "p", tp["wp"][i], "n" ) }$
! (10) Update r_dot_r_old for next iteration.
r_dot_r_old = r_dot_r_new
end do cg_iterations
deallocate (r, p, Ap)
return
end subroutine stdlib_krylov_${tp["subnam"][i]}$_CG
!!
#:endfor
#:endfor
Here’s the trick, first define an empty dictionary in fypp, then define a second dictionary specific to the implementation you want, and add it to the dictionary. If the dictionaries are all constructed with the same structure, everything works fine.
#:set wp_kinds = ["sp", "dp"]
#:set type_dict = {}
#:include "operations_intrinsics.fypp"
#:set type_dict = { "intrinsics": intrinsics_dict }
But what’s inside “operations_intrinsics.fypp”? The operations you need to perform in a generic iterative method written with your specific data type.
#:mute
#!
#! Inquire operations
#!
#:def intrinsics_rowsize(A)
size( ${A}$, 1 )
#:enddef
#:def intrinsics_colsize(A)
size( ${A}$, 2 )
#:enddef
#!
#! Access operations
#!
#:def intrinsics_coef(vec)
${vec}$
#:enddef
#!
#! Vector-Vector operations
#!
#:def intrinsics_dotprod( vec1, vec2 , n)
dot_product( ${vec1}$, ${vec2}$ )
#:enddef
#:def intrinsics_axpy( alpha, x, y, wp, n )
${y}$ = ${y}$ + (${alpha}$) * ${x}$
#:enddef
#:def intrinsics_axpby( alpha, x, beta, y, wp, n )
${y}$ = ${beta}$ * ${y}$ + ${alpha}$ * ${x}$
#:enddef
#!
#! Matrix-Vector operations
#!
#:def intrinsics_matvec(A, vec_in, vec_out, wp )
${vec_out}$ = matmul( ${A}$, ${vec_in}$ )
#:enddef
#:def intrinsics_matAxpy(alpha, vec_in, vec_out, wp, n )
${vec_out}$ = ${alpha}$ * matmul( ${A}$, ${vec_in}$) + ${vec_out}$
#:enddef
#:def intrinsics_matAxpby(A, x, beta, y, wp, n )
${vec_out}$ = ${alpha}$ * matmul( ${A}$, ${vec_in}$) + beta * ${vec_out}$
#:enddef
#!
#! Dictionary definition
#!
#! | All these must be coherent |
#:set intrinsics_dict = { &
& "subnam": [ "intrinsic_dp"], &
& "matrix": [ "real(dp)"], &
& "vector": [ "real(dp)"], &
& "scalar": [ "real(dp)"], &
& "wp": [ "dp"], &
& "matrnk": [ "(:,:)"], &
& "vecrnk": [ "(:)"], &
& "rowsize": intrinsics_rowsize, &
& "colsize": intrinsics_colsize, &
& "dotprod": intrinsics_dotprod, &
& "coef": intrinsics_coef, &
& "axpy": intrinsics_axpy, &
& "axpby": intrinsics_axpby, &
& "matvec": intrinsics_matvec, &
& "matAxpy": intrinsics_matAxpy, &
& "matAxpby": intrinsics_matAxpby &
& }
#:endmute
This particular example is not really useful, I agree, but it renders the idea. In particular, this allows you to write the solver and test its algorithmic correctness, add checks, documentation and whatever you want, regardless of the data type you are using. Then, you define another data type, say the sparse matrix set, define its dictionary, and include it into fypp. Voilà, you have your CG for sparse matrices as well. To give you an idea, the sparse dictionary looks like this
#:mute
#:set SPARSE_TYPES = ["COO_", "CSR_", "CSC_", "ELL_"]
#!
#! Inquire operations
#!
#:def sparse_rowsize(A)
${A}$%nrows
#:enddef
#:def sparse_colsize(A)
${A}$%ncols
#:enddef
#!
#! Access operations
#!
#:def sparse_coef(vec)
${vec}$
#:enddef
#!
#! Vector-Vector operations
#!
#:def sparse_dotprod( vec1, vec2 , n)
dot( ${n}$, ${vec1}$, 1, ${vec2}$, 1 )
#:enddef
#:def sparse_axpy( alpha, x, y, wp, n )
call axpy( ${n}$, ${alpha}$, ${x}$, 1, ${y}$, 1 )
#:enddef
#:def sparse_axpby( alpha, x, beta, y, wp, n )
! ${y}$ <- ${beta}$ * ${y}$
! ${y}$ <- ${y}$ + ${alpha}$ * ${x}$
call scal( ${n}$, ${beta}$, ${y}$, 1 )
call axpy( ${n}$, ${alpha}$, ${x}$, 1, ${y}$, 1 )
#:enddef
#!
#! Matrix-Vector operations
#!
#:def sparse_matvec(A, vec_in, vec_out, wp )
! Computes ${vec_out}$ = ${A}$ * ${vec_in}$.
${vec_out}$ = 0._${wp}$
call matvec( ${vec_out}$ , ${A}$ , ${vec_in}$ )
#:enddef
#:def sparse_matAxpy(alpha, vec_in, vec_out, wp, n )
call matvec( ${vec_out}$ , ${A}$ , ${vec_in}$ )
#:enddef
#:def sparse_matAxpby(A, x, beta, y, wp, n )
${vec_out}$ = ${beta}$ * ${vec_out}$
call matvec( ${vec_out}$ , ${A}$ , ${vec_in}$ )
#:enddef
#!
#! Dictionary definition
#!
#! | All these must be coherent |
#:set sparse_dict = { &
& "subnam": [x+y for x in SPARSE_TYPES for y in wp_kinds], &
& "matrix": ["type("+x+y+")" for x in SPARSE_TYPES for y in wp_kinds], &
& "vector": ["real("+y+")" for x in SPARSE_TYPES for y in wp_kinds], &
& "scalar": ["real("+y+")" for x in SPARSE_TYPES for y in wp_kinds], &
& "wp": [y for x in SPARSE_TYPES for y in wp_kinds], &
& "matrnk": [" "], &
& "vecrnk": ["(:)"], &
& "rowsize": sparse_rowsize, &
& "colsize": sparse_colsize, &
& "dotprod": sparse_dotprod, &
& "coef": sparse_coef, &
& "axpy": sparse_axpy, &
& "axpby": sparse_axpby, &
& "matvec": sparse_matvec, &
& "matAxpy": sparse_matAxpy, &
& "matAxpby": sparse_matAxpby &
& }
#:endmute
If you have been following the development of the sparse suite, you might notice that now the matvec is no more called matvec but spmv. Changing the name of the routine in the dictionary actually changes the call in the whole library depending on the dictionary (which is quite nice rather than going full “search and replace”). The only thing I dislike of this approach is the tedious syntax of python dictionaries. The take at home message is code only the standard method and its preconditioned version once, then let fypp do the magic. Right now, the whole code for PCG is 300 lines long. After fypping is 3000 lines long, but as long as you test the correctness of your dictionaries, all the replicas are algorithmically doing the same operations.
Conclusions
Let me know what you think, what should added, what could be done in a clever/better way. This is the ground structure of what it could be in the future `stdlib_krylov`.