Generic or Kind-Agnostic linear system solvers

kimala · April 25, 2024, 1:26pm

First, thank you for your answer.

The way I am tackling the task here is first to set up a solver with intrinsics and from there build up more complicated stuff. The first step is of course change the intrinsics to blas/lapack, but at this point there is no need to throw away the previous implementation, that could be helpful as a gateway to explain the methods (a CG is basically 20lines of fortran, apart from declaration). Moreover, providing both intrinsic and blas/lapack implementation could be extremely useful then to understand how to use those tools.

So, how to make them not mutually exclusive at compile time, meaning not to have to choose either intrinsic or lapack, but rather having both of them compiled?

FedericoPerini:

What did old codes do? Ask the user to provide a “preconditioning” routine that applies M^{-1}x. I think it’s exactly the right approach. You may have it in many ways. For example, one could be:
procedure(precondition_left), pointer :: left_prec => null()
procedure(precondition_right), pointer :: right_prec => null()
whether either (or none) of these functions is provided on initialization.

This is super interesting, are these called procedure pointers right? I will try looking into it, but are these “as dangerous” as regular pointers (which I am not familiar with, I still have to check them out in detail)?

kimala · June 22, 2024, 7:58pm

Dear all, I’m here to propose to you the zero-th iteration of the iterative method suite. First, let me thank @hkvzjal, @ivanpribec, @loiseaujc and @FedericoPerini for their replies to my questions and their suggestions. Moreover, I want you to pose yourself the question “how much fypp is too much fypp?” and give me your feedback, as you will see I might have gone a bit too far. Of course, everything in this proposal is open to changes, from naming to code structure, according to your suggestions, so please do not refrain from criticism.

The zero-th iteration only contains a Preconditioned Conjugate Gradient, as I am mostly studying the structure of the library rather than the actual methods. Also, sorry for the wall of text.

API proposal

The interface wants to be simple but customizable. The simplest case, for the user, is just

  call stdlib_linalg_cg( A, rhs, x)

within a program. This call solves the linear system A\mathbf{x}=\mathbf{b} with a tolerance tol = 10^{-6}, a maximum number of 1000 iterations and does not print any statement out. Eventually, the call

  call stdlib_linalg_cg( A, rhs, x, info)

provides also information through the integer info about the successful run of the solver (info=0) or the unsuccessful run (info=-1). Building up from this, one can change the default options with the

  call set_cg_opts(itmax=15)
  call stdlib_linalg_cg( A, rhs, x, info) ! Runs maximum 15 iterations

and modify these parameters for all the successive runs, as the container of the default options (cg_defopts) is a global, public variable. A call with the option restore=.true. restores the default values for the successive runs

  call set_cg_opts(restore=.true.)
  call stdlib_linalg_cg( A, rhs, x, info) ! Runs with original options

Clearly, one might want to have multiple set of options to run a CG under different conditions (e.g. to solve a linear system with good accuracy and to use it as a preconditioner for a Rayleigh Quotient minimization, with low accuracy or small maximum iteration number). This can be achieved by defining an option type

  !! Option type
  type(cg_options)    :: cg_solver, cg_precond
  […]
  call set_cg_opts(cg_solver, tol=10e-10)
  call set_cg_opts(cg_precond, tol=10e-4)
  […]
  call stdlib_linalg_cg( A, rhs, x, cg_solver)
  call stdlib_linalg_cg( A, rhs, x, cg_precond)

Finally, one might need to specify an initial guess for the solver. This can be done with

  call stdlib_linalg_cg( A, rhs, x, xzero = .true. )

which tells the solver to compute the initial residual as \mathbf{r}=\mathbf{b} - A\mathbf{x}, where \mathbf{x} is the actual passed argument of the call. Clearly,

  call stdlib_linalg_cg( A, rhs, x, xzero = .false. )

computes the initial residual as \mathbf{r}=\mathbf{b} and allows bookkeeping of what one is doing if needed. Moving on to the next big one.

Preconditioning
IMHO, there is no way of doing versatile preconditioning without forcing the user to use an abstract type. The preconditioner type is defined as

  !!
  !! real(wp) preconditioner
  !!
  type, abstract :: wp_preconditioner
      !! Abstract type defining the empty shell.
    contains
      !! Procedure for the application of the preconditioner to a vector.
      procedure(wp_apply), deferred :: apply
      !!
  end type wp_preconditioner
  
  abstract interface
     !! Abstract procedure defining the interface for a
     !! general application action
     subroutine wp_apply(this,vec_in,vec_out,ierr)
       import wp
       import wp_preconditioner
       implicit none
       !! The class with all quantities describing the operator
       class(wp_preconditioner), intent(in   ) :: this
       !! The input" vector 
       real(wp),   intent(in   ) :: vec_in(:)
       !! The output vector y= "operator apllied on" x
       real(wp),   intent(inout) :: vec_out(:)
       !! An integer flag to inform about error
       !! It is zero if no error occured
       integer,            intent(inout) :: ierr
     end subroutine wp_apply
     !!
     !!
  end interface

Please, interpret wp as any kind of precision is needed, this will be fypped later. This is unfortunately needed because if we want to defer apply we need to specify the interface and thus the kind of the input/output arrays. As a trivial example, one could define a Jacobi preconditioner

module Jacobi_prec
  use stdlib_kinds
  use stdlib_linalg_preconditioners
  private
  !! Extension of dp_preconditioner to define a Jacobi one
  type, public, extends(dp_preconditioner) :: dp_Jacobi
      !
      real(dp), allocatable :: Diag(:)
      integer               :: nrows
      !
    contains
      !
      procedure, public, pass :: init => init_Prec
      !! initialize procedure
      procedure, public, pass :: apply =>  apply_Prec
      !! apply procedure, overridden 
      !
  end type dp_Jacobi
  !!
contains
  !! constructor
  subroutine init_Prec(this, nrows, matIN)
    implicit none
    class(dp_Jacobi), intent(inout) :: this
    !
    integer,                intent(in   ) :: nrows
    real(dp),               intent(in   ) :: matIN(:,:)
    integer                               :: i
    !
    this%nrows = nrows
    allocate(this%Diag(nrows))
    do i = 1, this%nrows
      this%Diag(i) = matIN(i,i)
    end do
  end subroutine init_Prec
  !!
  !! apply tridiagonal operator
  subroutine apply_Prec(this,vec_in,vec_out,ierr)
    implicit none
    class(dp_Jacobi), intent(in   ) :: this
    real(dp),               intent(in   ) :: vec_in(:)
    real(dp),               intent(inout) :: vec_out(:)
    integer,                intent(inout) :: ierr
    integer                               :: i
    !
    do i = 1, this%nrows
      vec_out(i) = vec_in(i) / this%Diag(i)
    end do
    ierr = 0
  end subroutine apply_Prec
  !! 
end module Jacobi_prec

The interesting feature of the “empty shell” approach is that it only depends on the kind of vector we apply the preconditioner on to, through the apply routine. This means that with the same shell we already cover intrinsic matrices and the sparse matrix class under development. Any extension of this template to other vectors (e.g. abstract vectors, as in Enrico Facca’s proposal) is a matter of fypp (and actual implementation, clearly). Notice that the init routine cannot be made abstract because it will be impossible to specify a priori the input, so the user should take its time to build his preconditioner (which is not stupid) while some basic preconditioners could be already provided, e.g. Jacobi, SSOR, ICHOL and ILU, as a further expansion of the library.
The user developed method, or the library provided method will be accessible in the code with

  type(dp_Jacobi)     :: Jacobi
  […]
  call Jacobi%init(n, A)
  […]
  call stdlib_linalg_cg( A = A, K = Jacobi, rhs = rhs, x = x, info = info )

where K is the preconditioner.

What’s under the hood

Let’s start from the options. The `cg_options` type is defined as

  !! Definition: cg_options
  type, public :: cg_options
      !!
      !! Maximum number of iterations 
      integer :: itmax = 1000
      !!
      !! Single and double precision tolerance
      real(sp) :: tol_sp = 0.0000001_sp
      real(dp) :: tol_dp = 0.0000001_dp
      !!
      !! Verbosity and writing unit 
      logical  :: verb = .false.
      integer  :: out_unit = 6
      !!
  end type

And within the module there is the set_cg_options subroutine that I do not post here because it’s just an awful set of conditionals. Its interface reads

  subroutine set_cg_opts(opt, itmax, tol_sp, tol_dp,  verb, out_unit, restore )
    !!
    type(cg_options), intent(inout), optional :: opt
    integer,  intent(in   ), optional :: itmax
    real(sp), intent(in   ), optional :: tol_sp
    real(dp), intent(in   ), optional :: tol_dp
    logical,  intent(in   ), optional :: verb
    integer,  intent(in   ), optional :: out_unit
    logical,  intent(in   ), optional :: restore

Here I have a request for help: It is clear that these set of values are common to all solvers, but some (e.g. restarted GMRES) have more options (say a restart number nrest). It would be natural then to define a base class kr_options and extend it when designing a method. However, I don’t understand if and how inheritance would work for methods of the class, if I can extend a method or not. Ironically, the design of the options container is what keeps me up at night and I can’t figure it out.

The actual solver

Here comes the fun. This is the implemented algorithm, with its interface and all, for the case of intrinsic data types. What I’m looking for here is some suggestion about code structuring and eventual checks to perform in the subroutine (not code check or algorithmic correctness check).

  subroutine stdlib_krylov_intrinsic_dp_CG( A, rhs, x, info, xzero, opts_in )
    !!
    !! In-Out variables
    real(dp),  intent(in   ) :: A(:,:) 
    !! Linear operator to be inverted.
    real(dp),  intent(in   ) :: rhs(:)
    !! Right-hand side vector.
    real(dp),  intent(inout) :: x(:)
    !! Solution vector.
    integer,   intent(  out) :: info
    !! Information flag.
    logical,   intent(in   ), optional :: xzero
    !! provided initial solution flag  
    type(cg_options),  intent(in   ), optional :: opts_in
    !! Option structure.
    !!
    !! Internal variables
    type(cg_options) :: options
    !! Default options
    real(dp), allocatable :: r(:)
    real(dp), allocatable :: p(:)
    real(dp), allocatable :: Ap(:)
    !! Residual, search direction and other vectors.
    real(dp) :: alpha, beta
    real(dp) :: r_dot_r_old, r_dot_r_new
    !! Scalars used in the CG algorithm.
    real(dp ) :: residual
    !! Residual norm for exit criterions
    integer :: itnum
    !! Iteration counter   
    integer :: n
    !! Problem size
    n = size( A, 1 )
    info = 0

    if (present(opts_in)) then
       options = opts_in
    else
       options = default_opts
    end if    
    !! Options to be set
    !!
    ! Allocate and initialize support vector
    allocate (Ap, mold = rhs)
    Ap = 0._dp

    ! Allocate residual and copy in rhs
    ! ( 2) Evaluate initial residual r_o = rhs ( - A*x_o later)
    allocate (r, source = rhs)

    ! ( 1) Parse whether an initial solution is provided or not
    if ( present(xzero) .and. xzero ) then 
      ! It is provided
      ! ( 1) Keep x_o
      ! ( 2) Evaluate initial residual r_o = rhs - A*x_o
      Ap = matmul( A, x )
      r = r - Ap
    else
      ! It is not provided
      ! ( 1) Set x to zero
      x = 0._dp
      ! ( 2) r = b already performed in allocation
    end if

    ! Allocate direction vector and copy-in p = r.
    allocate (p, source = r  )

    ! Initialize dot product of residual: r_dot_r_old = r' * r.
    r_dot_r_old = dot_product( r, r )
    
    ! Safety computation to avoid a possible division by zero in r*r/p*Ap
    if ( sqrt(r_dot_r_old) == 0._dp ) p = options%tol_dp

    ! info flag 
    if ( present(info) ) info = -1

    ! CG Iteration Loop.
    cg_iterations: do itnum = 1, options%itmax

      ! ( 4) Compute A * p.
      Ap = matmul( A, p )
      ! ( 4) Compute step size alpha = r_dot_r_old / (p' * Ap).
      alpha = r_dot_r_old / dot_product( p, Ap )

      ! ( 5) Update solution x = x + alpha * p.
      x = x + (alpha) * p 

      ! ( 6) Update residual r = r - alpha * Ap.
      r = r + (-alpha) * Ap 

      ! ( 7) Compute new dot product of residual r_dot_r_new = r' * r.
      r_dot_r_new = dot_product( r, r ) 
      ! ( 7) Compute norm of residual.
      residual = sqrt(r_dot_r_new)

      ! ( 7) Check for convergence
      if (options%verb) then
         write (options%out_unit, *) "INFO : CG residual after ", (itnum), "iterations : ", residual
      end if

      if ( residual < options%tol_dp ) then
        if (options%verb) then
          write (options%out_unit, *) "INFO : CG Converged: residual ", residual, "< tolerance: ", options%tol_dp
        end if
        if ( present(info) ) info = 0
        exit cg_iterations
      end if

      ! ( 8) Compute new direction beta = r_dot_r_new / r_dot_r_old.
      beta = r_dot_r_new/r_dot_r_old

      ! ( 9) Update direction p = r + beta * p.
      p = beta * p + 1._dp * r 

      ! (10) Update r_dot_r_old for next iteration.
      r_dot_r_old = r_dot_r_new

    end do cg_iterations

    deallocate (r, p, Ap)

    return

  end subroutine stdlib_krylov_intrinsic_dp_CG

You might wonder: what kind of data types are supported? If we look at the complete declaration of variables of this routine, we have a matrix, a vector, a scalar, and implicitly a precision. These are the things that must be prescribed if we want to extend this algorithm to some other data type. So I propose the following approach: define the algorithm for tabular, intrinsic arrays, define it for the forthcoming sparse class, and define it for abstract Operators-Vectors-Scalars (I sense that abstract scalars will be handy defining parallel reductions to define parameters such as alpha which is a ratio of scalar products), which I will start developing and testing soon. With these three macro-kinds, we should be able to target a broad class of usages. In particular, what the library currently handles are the following types.

sp

dp

Matrix	Vector	Scalar
real(sp)	real(sp)	real(sp)
COO_sp	real(sp)	real(sp)
CSR_sp	real(sp)	real(sp)
CSC_sp	real(sp)	real(sp)
ELL_sp	real(sp)	real(sp)
sp_Operator	real(sp)	real(sp)
sp_Operator	sp_Vector	real(sp)

Matrix	Vector	Scalar
real(dp)	real(dp)	real(dp)
COO_dp	real(dp)	real(dp)
CSR_dp	real(dp)	real(dp)
CSC_dp	real(dp)	real(dp)
ELL_dp	real(dp)	real(dp)
dp_Operator	real(dp)	real(dp)
dp_Operator	dp_Vector	real(dp)

Code the algorithm once, scatter on all data types

Ok, but how do we code it for all these data types without losing our minds? Enters fypp. The actual implemented routine is the result of the early days of the developing, when I coded all the Conjugate "Things" kind of algorithm, just to find errors and to correct them multiple times, on multiple data structures and so on. So I came out with the following:

  !!
  !!
  !! Conjugate Gradient Algorithm implementation:
  !!
  #:for tp in type_dict.values()
  #:for i in range(len(tp["subnam"]))
  !! ${tp["matrix"][i]}$-${tp["vector"][i]}$ classical version
  !!
  subroutine stdlib_krylov_${tp["subnam"][i]}$_CG( A, rhs, x, info, xzero, opts_in )
    !!
    !! In-Out variables
    ${tp["matrix"][i]}$,  intent(in   ) :: A${tp["matrnk"][0]}$ 
    !! Linear operator to be inverted.
    ${tp["vector"][i]}$,  intent(in   ) :: rhs${tp["vecrnk"][0]}$
    !! Right-hand side vector.
    ${tp["vector"][i]}$,  intent(inout) :: x${tp["vecrnk"][0]}$
    !! Solution vector.
    integer,   intent(  out) :: info
    !! Information flag.
    logical,   intent(in   ), optional :: xzero
    !! provided initial solution flag  
    type(cg_options),  intent(in   ), optional :: opts_in
    !! Option structure.
    !!
    !! Internal variables
    type(cg_options) :: options
    !! Default options
    ${tp["vector"][i]}$, allocatable :: r${tp["vecrnk"][0]}$
    ${tp["vector"][i]}$, allocatable :: p${tp["vecrnk"][0]}$
    ${tp["vector"][i]}$, allocatable :: Ap${tp["vecrnk"][0]}$
    !! Residual, search direction and other vectors.
    ${tp["scalar"][i]}$ :: alpha, beta
    ${tp["scalar"][i]}$ :: r_dot_r_old, r_dot_r_new
    !! Scalars used in the CG algorithm.
    real(${tp["wp"][i]}$ ) :: residual
    !! Residual norm for exit criterions
    integer :: itnum
    !! Iteration counter   
    integer :: n
    !! Problem size
    n = ${ tp["rowsize"]("A") }$
    info = 0

    if (present(opts_in)) then
       options = opts_in
    else
       options = default_opts
    end if    
    !! Options to be set
    !!
    ! Allocate and initialize support vector
    allocate (Ap, mold = rhs)
    ${ tp["coef"]("Ap") }$ = 0._${tp["wp"][i]}$

    ! Allocate residual and copy in rhs
    ! ( 2) Evaluate initial residual r_o = rhs ( - A*x_o later)
    allocate (r, source = rhs)

    ! ( 1) Parse whether an initial solution is provided or not
    if ( present(xzero) .and. xzero ) then 
      ! It is provided
      ! ( 1) Keep x_o
      ! ( 2) Evaluate initial residual r_o = rhs - A*x_o
      ${ tp["matvec"]("A", "x", "Ap", tp["wp"][i] ) }$
      ${ tp["coef"]("r") }$ = ${ tp["coef"]("r") }$ - ${ tp["coef"]("Ap") }$
    else
      ! It is not provided
      ! ( 1) Set x to zero
      ${ tp["coef"]("x") }$ = 0._${tp["wp"][i]}$
      ! ( 2) r = b already performed in allocation
    end if

    ! Allocate direction vector and copy-in p = r.
    allocate (p, source = r  )

    ! Initialize dot product of residual: r_dot_r_old = r' * r.
    r_dot_r_old = ${ tp["dotprod"]( "r", "r", "n") }$
    
    ! Safety computation to avoid a possible division by zero in r*r/p*Ap
    if ( sqrt(r_dot_r_old) == 0._${tp["wp"][i]}$ ) ${ tp["coef"]("p") }$ = options%tol_${tp["wp"][i]}$

    ! CG Iteration Loop.
    cg_iterations: do itnum = 1, options%itmax

      ! ( 4) Compute A * p.
      ${ tp["matvec"]("A", "p", "Ap", tp["wp"][i] ) }$
      ! ( 4) Compute step size alpha = r_dot_r_old / (p' * Ap).
      alpha = r_dot_r_old / ${ tp["dotprod"]( "p", "Ap", "n") }$

      ! ( 5) Update solution x = x + alpha * p.
      ${ tp["axpy"]("alpha", "p", "x", tp["wp"][i], "n" ) }$

      ! ( 6) Update residual r = r - alpha * Ap.
      ${ tp["axpy"]("-alpha", "Ap", "r", tp["wp"][i], "n" ) }$

      ! ( 7) Compute new dot product of residual r_dot_r_new = r' * r.
      r_dot_r_new = ${ tp["dotprod"]( "r", "r", "n") }$ 
      ! ( 7) Compute norm of residual.
      residual = sqrt(r_dot_r_new)

      ! ( 7) Check for convergence
      if (options%verb) then
         write (options%out_unit, *) "INFO : CG residual after ", (itnum), "iterations : ", residual
      end if

      if ( residual < options%tol_${tp["wp"][i]}$ ) then
        if (options%verb) then
          write (options%out_unit, *) "INFO : CG Converged: residual ", residual, "< tolerance: ", options%tol_${tp["wp"][i]}$
        end if
        exit cg_iterations
      end if

      ! ( 8) Compute new direction beta = r_dot_r_new / r_dot_r_old.
      beta = r_dot_r_new/r_dot_r_old

      ! ( 9) Update direction p = r + beta * p.
      ${ tp["axpby"]("1._"+tp["wp"][i], "r", "beta", "p", tp["wp"][i], "n" ) }$

      ! (10) Update r_dot_r_old for next iteration.
      r_dot_r_old = r_dot_r_new

    end do cg_iterations

    deallocate (r, p, Ap)

    return

  end subroutine stdlib_krylov_${tp["subnam"][i]}$_CG
  !!
  #:endfor
  #:endfor

Here’s the trick, first define an empty dictionary in fypp, then define a second dictionary specific to the implementation you want, and add it to the dictionary. If the dictionaries are all constructed with the same structure, everything works fine.

#:set wp_kinds = ["sp", "dp"]
#:set type_dict = {}
#:include "operations_intrinsics.fypp"
#:set type_dict = { "intrinsics": intrinsics_dict }

But what’s inside “operations_intrinsics.fypp”? The operations you need to perform in a generic iterative method written with your specific data type.

#:mute
#!
#! Inquire operations
#! 
#:def intrinsics_rowsize(A)
size( ${A}$, 1 )
#:enddef

#:def intrinsics_colsize(A)
size( ${A}$, 2 )
#:enddef

#!
#! Access operations
#! 
#:def intrinsics_coef(vec)
${vec}$
#:enddef

#!
#! Vector-Vector operations
#! 
#:def intrinsics_dotprod( vec1, vec2 , n)
dot_product( ${vec1}$, ${vec2}$ )
#:enddef

#:def intrinsics_axpy( alpha, x, y, wp, n )
${y}$ = ${y}$ + (${alpha}$) * ${x}$ 
#:enddef

#:def intrinsics_axpby( alpha, x, beta, y, wp, n )
${y}$ = ${beta}$ * ${y}$ + ${alpha}$ * ${x}$ 
#:enddef

#!
#! Matrix-Vector operations
#! 
#:def intrinsics_matvec(A, vec_in, vec_out, wp )
${vec_out}$ = matmul( ${A}$, ${vec_in}$ )
#:enddef

#:def intrinsics_matAxpy(alpha, vec_in, vec_out, wp, n )
${vec_out}$ = ${alpha}$ * matmul( ${A}$, ${vec_in}$) + ${vec_out}$ 
#:enddef

#:def intrinsics_matAxpby(A, x, beta, y, wp, n )
${vec_out}$ = ${alpha}$ * matmul( ${A}$, ${vec_in}$) + beta * ${vec_out}$ 
#:enddef

#!
#! Dictionary definition
#! 
#!                        | All these must be coherent    |
#:set intrinsics_dict = { &
            &   "subnam": [ "intrinsic_dp"],    &
            &   "matrix": [ "real(dp)"],        &
            &   "vector": [ "real(dp)"],        &
            &   "scalar": [ "real(dp)"],        &
            &   "wp":     [ "dp"],              &
            &   "matrnk": [ "(:,:)"],           &
            &   "vecrnk": [ "(:)"],             &
            &   "rowsize": intrinsics_rowsize,  &
            &   "colsize": intrinsics_colsize,  &
            &   "dotprod": intrinsics_dotprod,  &
            &   "coef": intrinsics_coef,        &
            &   "axpy": intrinsics_axpy,        &
            &   "axpby": intrinsics_axpby,      &
            &   "matvec": intrinsics_matvec,    &
            &   "matAxpy": intrinsics_matAxpy,  &
            &   "matAxpby": intrinsics_matAxpby &
            & }

#:endmute

This particular example is not really useful, I agree, but it renders the idea. In particular, this allows you to write the solver and test its algorithmic correctness, add checks, documentation and whatever you want, regardless of the data type you are using. Then, you define another data type, say the sparse matrix set, define its dictionary, and include it into fypp. Voilà, you have your CG for sparse matrices as well. To give you an idea, the sparse dictionary looks like this

#:mute

#:set SPARSE_TYPES = ["COO_", "CSR_", "CSC_", "ELL_"]

#!
#! Inquire operations
#! 
#:def sparse_rowsize(A)
${A}$%nrows
#:enddef

#:def sparse_colsize(A)
${A}$%ncols
#:enddef

#!
#! Access operations
#! 
#:def sparse_coef(vec)
${vec}$
#:enddef

#!
#! Vector-Vector operations
#! 
#:def sparse_dotprod( vec1, vec2 , n)
dot( ${n}$, ${vec1}$, 1, ${vec2}$, 1 )
#:enddef

#:def sparse_axpy( alpha, x, y, wp, n )
call axpy( ${n}$, ${alpha}$, ${x}$, 1, ${y}$, 1  ) 
#:enddef

#:def sparse_axpby( alpha, x, beta, y, wp, n )
!  ${y}$ <- ${beta}$ * ${y}$
      !  ${y}$ <-  ${y}$ + ${alpha}$ *  ${x}$
      call scal( ${n}$, ${beta}$, ${y}$, 1 )
      call axpy( ${n}$, ${alpha}$, ${x}$, 1, ${y}$, 1  ) 
#:enddef

#!
#! Matrix-Vector operations
#! 
#:def sparse_matvec(A, vec_in, vec_out, wp )
! Computes ${vec_out}$ = ${A}$ * ${vec_in}$.
      ${vec_out}$ = 0._${wp}$
      call matvec( ${vec_out}$ , ${A}$ , ${vec_in}$ ) 
#:enddef

#:def sparse_matAxpy(alpha, vec_in, vec_out, wp, n )
      call matvec( ${vec_out}$ , ${A}$ , ${vec_in}$ ) 
#:enddef

#:def sparse_matAxpby(A, x, beta, y, wp, n )
      ${vec_out}$ = ${beta}$ * ${vec_out}$
      call matvec( ${vec_out}$ , ${A}$ , ${vec_in}$ ) 
#:enddef

#!
#! Dictionary definition
#! 
#!                                         |        All these must be coherent    |
#:set sparse_dict = { &
            &   "subnam": [x+y             for x in SPARSE_TYPES for y in wp_kinds], &
            &   "matrix": ["type("+x+y+")" for x in SPARSE_TYPES for y in wp_kinds], &
            &   "vector": ["real("+y+")"   for x in SPARSE_TYPES for y in wp_kinds], &
            &   "scalar": ["real("+y+")"   for x in SPARSE_TYPES for y in wp_kinds], &
            &   "wp":     [y               for x in SPARSE_TYPES for y in wp_kinds], &
            &   "matrnk": [" "],            &
            &   "vecrnk": ["(:)"],          &
            &   "rowsize": sparse_rowsize,  &
            &   "colsize": sparse_colsize,  &
            &   "dotprod": sparse_dotprod,  &
            &   "coef": sparse_coef,    &
            &   "axpy": sparse_axpy,        &
            &   "axpby": sparse_axpby,      &
            &   "matvec": sparse_matvec,    &
            &   "matAxpy": sparse_matAxpy,  &
            &   "matAxpby": sparse_matAxpby &
            & }

#:endmute

If you have been following the development of the sparse suite, you might notice that now the matvec is no more called matvec but spmv. Changing the name of the routine in the dictionary actually changes the call in the whole library depending on the dictionary (which is quite nice rather than going full “search and replace”). The only thing I dislike of this approach is the tedious syntax of python dictionaries. The take at home message is code only the standard method and its preconditioned version once, then let fypp do the magic. Right now, the whole code for PCG is 300 lines long. After fypping is 3000 lines long, but as long as you test the correctness of your dictionaries, all the replicas are algorithmically doing the same operations.

Conclusions

Let me know what you think, what should added, what could be done in a clever/better way. This is the ground structure of what it could be in the future `stdlib_krylov`.

FortranFan · June 23, 2024, 4:20am

@kimala ,

In the context of this thread with Generic solvers that may need to work with any KIND of data (especially floating-point, or the so-called REAL values in Fortran):

it will be great if you and all Fortranners develop a vision for a Fortran language that sees any “fypp is too much fypp” and push the standard committees to develop facilities that do not then require you to rely on preprocessing for Generics and
keep the use of object-oriented (OO) paradigm to.a certain bare minimum when it comes to achieving generic/templated code for solvers

kimala · June 23, 2024, 8:34am

@FortranFan ,

This is not only out of my expertise, but also out of my knowledge. I cannot provide any kind of valuable proposal to the standard committee, so I refrain from just stating wishlists and wait for them to happen. Moreover, the committee can come out tomorrow with a solution to all our problems, but only God knows how long it will take to have them in the compiler. So my approach is to take the tools I have and build something first, something easy to use, useful to use, and easy to maintain. I am not proposing an iterative solver suite because I get bored during my weekends and I need an hobby, I’m proposing stuff because I wish I could have this kind of stuff in the codes I actually work with, because I am tired of coding like it was the nineties, and that is my 9 to 5 job. So my philosophy is to build now, and build good. If generics come out tomorrow, I’ll be the first one looking into it and I will in happy to adapt the library with them.
I am no expert in OOP, so all my codes have just a tad of OOP. This is the reason the option container is keeping me up.

kimala · February 24, 2025, 11:27am

(Hey all, I recently restarted with this project after a long pause, so sorry for necro-posting.)

Logging

As the pCG is almost functional with all the necessary data types, before starting to replicate it to create more solvers, I would like to ask you what kind of logging strategy you would prefer. At this stage, the template for the solver is the following:

  !!
  !! Preconditioned Conjugate Gradient Algorithm implementation:
  !!    real(dp)-real(dp)
  !!
  subroutine stdlib_krylov_pCG_intrinsic_dp( A, K, rhs, x, info, xzero, opts_in )
    !!
    !! In-Out variables
    real(dp),  intent(in   ) :: A(:,:) 
    !! Linear operator to be inverted.
    class(Preconditioner_dp),  intent(in   ) :: K
    !! Preconditioner to use.
    real(dp),  intent(in   ) :: rhs(:)
    !! Right-hand side vector.
    real(dp),  intent(inout) :: x(:)
    !! Solution vector.
    integer,   intent(  out) :: info
    !! Information flag.
    logical,   intent(in   ), optional :: xzero
    !! provided initial solution flag  
    type(cg_options),  intent(in   ), optional :: opts_in
    !! Option structure.
    !!
    !! Internal variables
    type(cg_options) :: options
    !! Default options
    real(dp), allocatable :: r(:)
    real(dp), allocatable :: Kr(:)
    real(dp), allocatable :: p(:)
    real(dp), allocatable :: Ap(:)
    !! Residual, search direction and other vectors.
    real(dp) :: alpha, beta
    real(dp) :: Kr_dot_r_old, Kr_dot_Ap, rhsnorm
    real(dp) :: r_dot_p
    real(dp) :: p_dot_Ap

    !! Scalars used in the CG algorithm.
    real(dp ) :: residual
    !! Residual norm for exit criterions
    integer :: itnum
    !! Iteration counter
    integer, parameter :: RCitmax = 5
    !! Residual Correction Iteration number 
    integer :: n
    !! Problem size
    n = size( A, 1 )
    info = 0

    if (present(opts_in)) then
       options = opts_in
    else
       options = default_opts
    end if    
    !! Options to be set
    !!
    ! Allocate and initialize support vector
    allocate (Ap, source = rhs)
    Ap = 0._dp

    ! Allocate residual and copy in rhs
    ! ( 2) Evaluate initial residual r_o = rhs ( - A*x_o later)
    allocate (r, source = rhs)

    ! ( 1) Parse whether an initial solution is provided or not
    if ( present(xzero) .and. xzero ) then 
      ! It is provided
      ! ( 1) Keep x_o
      ! ( 2) Evaluate initial residual r_o = rhs - A*x_o
      Ap = matmul( A, x )
      r = 1._dp * r + (-1._dp) * Ap 
    else
      ! It is not provided
      ! ( 1) Set x to zero
      x = 0._dp
      ! ( 2) r = b already performed in allocation
    end if
 
    ! ( 3) Apply preconditioner to r
    allocate (Kr, source = rhs)
    call K%apply(r, Kr, info)

    ! Allocate direction vector and copy-in p = Kr.
    allocate (p, source = Kr  )    
    
    ! Initialize dot product of residual: r_dot_r_old = Kr' * r.
    Kr_dot_r_old = dot_product( Kr, r )
    
    ! Safety computation to avoid a possible division by zero in r*r/p*Ap
    if ( sqrt(Kr_dot_r_old) == 0._dp ) p = options%tol_dp 

    ! Compute norm of right-hand side
    rhsnorm = sqrt( dot_product( rhs, rhs ) )

    ! "Residual correction" method, see Gambolati-Ferronato. 
    ! Keep RCitnum low (default is 5, which is already high)
    do itnum = 1, RCitmax

      ! (cr1) Apply preconditioner to r
      call K%apply(r, Kr, info)

      ! (cr2) Update solution x = x + 1._+dp * Kr.
      x = x + (1._dp) * Kr 

      ! (cr3) Compute A * x (save it on r to help next axpby).
      r = matmul( A, x )

      ! (cr4) Update residual r = rhs - A*x = 1._dp * rhs + (-1._dp) * r.
      r = -1._dp * r + 1._dp * rhs 

      ! (cr5) Compute norm of residual.
      residual = sqrt(dot_product( r, r ))
      if (options%verb) then
         write (options%out_unit, *) "INFO : RC residual after ", (itnum), "iterations : ", residual/rhsnorm
      end if
 
    end do

    ! (4) CG Iteration Loop.
    cg_iterations: do itnum = RCitmax + 1, options%itmax

      ! ( 5) Compute A * p.
      Ap = matmul( A, p )
      ! ( 5) Compute r_dot_p = (r' * p)
      r_dot_p = dot_product( r, p )
      ! ( 5) Compute p_dot_Ap = (p' * Ap)
      p_dot_Ap = dot_product( p, Ap )
      ! ( 5) Compute step size alpha = (r' * p) / (p' * Ap).
      alpha = r_dot_p / p_dot_Ap

      ! ( 6) Update solution x = x + alpha * p.
      x = x + (alpha) * p 

      ! ( 7) Update residual r = r - alpha * Ap.
      r = r + (-alpha) * Ap 

      ! ( 8) Apply preconditioner to r
      call K%apply(r, Kr, info)

      ! ( 9) Compute new dot product of residual Kr_dot_Ap = Kr * r.
      Kr_dot_Ap = dot_product( Kr, Ap ) 

      ! ( 9) Compute norm of residual.
      residual = sqrt(dot_product( r, r ))

      ! ( 9) Check for convergence
      if (options%verb) then
         write (options%out_unit, *) "INFO : CG residual after ", (itnum), "iterations : ", residual/rhsnorm
      end if

      if ( residual/rhsnorm < options%tol_dp ) then
        if (options%verb) then
          write (options%out_unit, *) "INFO : CG Converged: residual ", residual/rhsnorm, "< tolerance: ", &
                                       options%tol_dp, itnum 
        end if
        exit cg_iterations
      end if

      ! (10) Compute new direction beta = - (Kr'* Ap) / (p' * Ap).
      beta = -1._dp * Kr_dot_Ap/p_dot_Ap

      ! (11) Update direction p = Kr + beta * p.
      p = beta * p + 1._dp * Kr 

      ! (12) Update Kr_dot_r_old for next iteration.
      Kr_dot_r_old = Kr_dot_Ap

    end do cg_iterations

    deallocate (r, p, Ap)

    return

  end subroutine stdlib_krylov_pCG_intrinsic_dp

Please note that this formulation is not really optimal, I did not check if I can combine some operation together. Also, if someone here knows the scheme of “residual corrections” please redirect me to a source: my source is trustworthy and the scheme works well, however I’ve only found it in an italian book and I would like to redirect a possible user to an english reference.

Function Traceback string

In my NetCDF suite I have a strategy employing an optional intent(in) character variable caller that gets concatenated to an internal callee that states the name of the function called and its generic name.

   module subroutine netCDF_read2D_dp([...], caller)
      !
      [...] 
      character(*),                  optional, intent(in   ) :: caller
      ! Traceback strings
      character(:), allocatable :: fntraceback
      character(:), allocatable :: callee 

      ! Define the name of the called function
      callee = "read => netCDF_read2D_dp"

      ! Check presence of caller function 
      if (present(caller)) then
        fntraceback = caller//"::"//callee
      else
        fntraceback = callee
      end if

      ! Check existence of the variable
      ier = nf90_inq_varID(self%file_id, varName, varID)
      if (ier /= 0) call handleException(ier, "nf90_inq_varID", nc=self%filename, var=varName, caller=fntraceback)
      [...]
   end subroutine

The PRO of this method is that you can keep concatenating function name recursively and thus see the complete functional path towards the error. So if you have a task with multiple solvers, you could this way backtrack easily which one fails.
EDIT: as suggested by @RonShepard this can be done in a more elegant way with linked lists instead of string concatenation. I will try to implement it in that fashion, for now take the idea and forgive my poor man’s implementation.

(Optional) Function Traceback string

Yes (with linked lists or something more elegant than this)
No (please explain in comments)
Maybe (please explain in comments)

0 voters

Operational logging vs convergence logging

In my humble opinion, I should separate the standard logging and the convergence logging. By standard logging I mean all the logs from entering a solver, taking note of what operator/preconditioner we are using, what specific implementation is being called by the generic, whats the function traceback and so on. All of this should be separated by the convergence logging, that could be done in a separate logger with (one each line) only iteration counter, a “numerical” timestamp (i.e. not a date) and the residual , separated by a special character sequence if solving is meant to be repeated. In this way analysing the convergence profiles is going to be much easier and straightforward than having to skim all the “useless” info from the log file.

(Optional) Separate standard logging and convergence logging

Yes
No (please explain in comments)
Maybe (please explain in comments)

0 voters

Timestamp ALL the operations

If you check the implementation of LightKrylov, that I am taking as a guide (with permission) to build this suite, they do an intense time and operation counting. Would this be useful in this suite? I would say yes, I like seeing how bad/well my solver performs and it could be very useful in spotting what operations are the most time consuming. We could perform this in several ways: we could log the timestamps in an ordered fashion for every iteration in a separate logger file, we could keep them in memory during the execution and just write the timestamps into an array and print it in onto the log file at the end of the iteration, with possibly an average of the operations in the Operational logger, we could do both or none.

Timestamp all

Log all (separate file+average)
Only the average (i.e. an extra array in the solver)
Only the separate file (so no extra array in solver)
No (please explain in comments)
Maybe (please explain in comments)

0 voters

Conclusion

All of this should be chosen through the options container that for now it is just

  type, public :: kr_options
      !!
      !! Maximum number of iterations 
      integer :: itmax = 100000
      !!
      !! Single and double precision tolerance
      real(sp) :: tol_sp = 1.0e-06
      real(dp) :: tol_dp = 1.0e-10
      !!
      !! Verbosity and writing unit 
      logical  :: verb = .true.
      integer  :: out_unit = 6
      !!
    contains 
      !!
      !!
  end type

Moreover, all the really verbose logging should be optional and in case no logging has to be performed, it should not add anything to the complexity of the solver. How could I achieve that? Are if statements really safe in terms of not adding too much cost?

Let me know what you think!
Also, soon I will create soon another thread for the Abstract Linear Algebra formalism, building on top of Enrico Facca’s PR so we can gather some ideas on the operations, nomenclature and so on.
Cheers!

Supported types

I don’t want to spoil too much, but the current implementation of the pCG supports all the following

!! A, intrinsic, single -> rhs, x, intrinsic, single (Preconditioner_sp)
!! A, intrinsic, double -> rhs, x, intrinsic, double (Preconditioner_dp)
!!
!! A, COO_type, single -> rhs, x, intrinsic, single (Preconditioner_sp)
!! A, COO_type, double -> rhs, x, intrinsic, double (Preconditioner_dp)
!! A, CRS_type, single -> rhs, x, intrinsic, single (Preconditioner_sp)
!! A, CRS_type, double -> rhs, x, intrinsic, double (Preconditioner_dp)
!! A, CRC_type, single -> rhs, x, intrinsic, single (Preconditioner_sp)
!! A, CRC_type, double -> rhs, x, intrinsic, double (Preconditioner_dp)
!! A, ELL_type, single -> rhs, x, intrinsic, single (Preconditioner_sp)
!! A, ELL_type, double -> rhs, x, intrinsic, double (Preconditioner_dp)
!! A, SELLC_type, single -> rhs, x, intrinsic, single (Preconditioner_sp)
!! A, SELLC_type, double -> rhs, x, intrinsic, double (Preconditioner_dp)
!!
!! A, Operator, single -> rhs, x, intrinsic, single (Preconditioner_sp)
!! A, Operator, double -> rhs, x, intrinsic, double (Preconditioner_dp)
!!
!! A, absOperator, single -> rhs, x, absVector, single (absPreconditioner_sp)
!! A, absOperator, double -> rhs, x, absVector, double (absPreconditioner_dp)

where Precoditioner_*p is preconditioning operator built upon Operator, that works with intrisic arrays. absPreconditioner, built upon absOperator works only with absVector.

RonShepard · February 24, 2025, 4:48pm

Presumably, every subprogram down the calling chain will have this local variable which gets appended, so the new copies keep getting longer and longer. That seems inelegant and wasteful of both memory and effort. Other approaches might be considered, such as a single linked list of the subroutine names, or maybe something done with a linked list of pointers to the local fixed strings?

kimala · February 24, 2025, 7:57pm

This is absolutely a good point, but you have to forgive the “version 0” implementation. I’ve edited the post to make clear that I will try to implement it with Linked Lists (as soon as I understand how to build one). Thanks a lot!

(and please, vote )

ivanpribec · February 24, 2025, 9:26pm

I find logging at the level of a single CG call over-complicated. It can be implemented by users:

use stdlib_logging
use stdlib_krylov, only: pcg

! ...
type(logger_type) :: my_log

! ...
call my_log%log_information("Calling CG...")
call pcg(A,K,rhs,x,info)
if (info /= 0) then
    ! Handle convergence or other failure
    ! ...
end if
call my_log%log_information("CG complete.")

Instead of the options%verb member, I would implement a user-callback function just like the Scipy version does: cg — SciPy v1.15.2 Manual. The option could look something like:

! ...
call pcg(A,K,rhs,x,info, &
     opts=cg_opts(callback=print_cg_iter))
! ...
contains

   ! Called after every CG iteration
   subroutine print_cg_iter(x, r)
       real(dp), intent(in) :: x(:), residual

       ! Call plotting library, print residual, etc.
       ! ...
   end subroutine

In the kr_options I would replace verb and out_unit with the callback interface, like this:

  type, public :: kr_options
      !!
      !! Maximum number of iterations 
      integer :: itmax = 100000
      !!
      !! Tolerance (used also by single-precision procedures)
      real(dp) :: tol_dp = 1.0e-10
      !!
      !! Callback procedure (called after every iteration)
      procedure(krylov_callback), pointer, nopass :: callback => null()
  end type

In the CG procedure you invoke it like this:

if (associated(opts%callback)) call opts%callback(x, residual)

In case you need to count iterations, compute other norms, log iteration output, etc., it can all be done using (nested) internal functions:

module poisson_solver
    use stdlib_krylov, only: pcg => krylov_pcg
contains
    subroutine solve( ..., verbose, out_unit)
        ! ...
        logical, intent(in) :: verbose
        integer, intent(in) :: out_unit

        integer :: niter
        ! ...
        niter = 0
        call pcg( ..., &
           opts = cg_options(itmax=1000,callback=my_callback))
    contains
        subroutine my_callback(x,r)
            real(dp), intent(in): x(:), r
            niter = niter + 1
            if (verbose) write(out_unit,*), niter, res
        end subroutine
    end subroutine
end module

program electrostatics
use poisson_solver, only: solve
call solve( ... )
end program

Btw, you don’t seem to set the info on output. Maybe you could use a convention:

info = 0, succesful output (converged)
info > 0, did not converge (or other issues)
info < 0, error in one of the dummy arguments.

kimala · March 3, 2025, 10:34pm

stdlib_LinAlgOperators: Linear Algebra Operators

stdlib_LinAlgOperators: Linear Algebra Operators

What would be the other paths one could go in modern fortran to accomplish what you’re doing ?

Besides type extension, the methods available for injecting behavior are:

Procedure arguments (as in SLATEC SLAP, e.g. dcg.f)

Procedure pointers (packed in a derived type for tidyness)

Reverse communication (just like ARPACK, the venerable templates, or oneMKL RCI ISS)

By means of internal procedures, it’s always possible to convert an interface using one approach into a different one. Internal procedures have some weakness (indirection, executable stack) as explained here: Is creating nested subroutines/functions considered good practice in Fortran? - #25 by szakharin. For realistic problems I’d expect the linear operator apply method to dominate the runtime.

@ivanpribec I’m moving this part of the conversation here because it’s more relevant here. If I understood correctly from 30 minutes of research and this page reverse communication wouldn’t be a big deal to implement on the library side. An first test is the following:

module stdlib_krylov_PCG
  use stdlib_kinds
  
  implicit none
  private
  !! Public methods
  public :: stdlib_linalg_cg
  !!
  interface stdlib_linalg_cg
    !!
    module procedure stdlib_krylov_CG
    module procedure stdlib_krylov_CG_reverse
    !!
  end interface
  !!
contains  !!
  subroutine stdlib_krylov_CG_reverse( n, wrk, x, info, itnum )
    !!
    integer,  intent(in   ) :: n
    real(dp), intent(inout) :: wrk(n,3) ! or 4 for preconditioning
    real(dp), intent(inout) :: x(n)
    integer,  intent(  out) :: info
    integer,  intent(inout) :: itnum
    !!
    real(dp)       :: alpha, beta
    real(dp)       :: p_dot_Ap
    real(dp), save :: r_dot_r
    real(dp), save :: r_dot_r_old
    !!
    !   Mapping (maybe associate?)
    !  p = wrk(:,1)
    ! Ap = wrk(:,2)
    !  r = wrk(:,3)
    ! Kr = wrk(:,4)

    !cg_iterations: do itnum = 1, options%itmax
      !
      if (info == 22) then
        info = 22 !request Matrix-Vector multiplication
        return
      end if 
      ! Ap = matmul( A, p ) ! Performed outside
      !
      r_dot_r = dot_product( wrk(:,3), wrk(:,3) )  
      
      if (info == 11) then ! the matvec product has done
        p_dot_Ap = dot_product( wrk(:,1), wrk(:,2) )
        alpha = r_dot_r / p_dot_Ap
        x = x + (alpha) * wrk(:,1) 
        wrk(:,3) = wrk(:,3) + (-alpha) * wrk(:,2) 
        r_dot_r_old = r_dot_r
        ! Time for test
        info = 21
        return
      end if 

      !if ( sqrt(r_dot_r) < options%tol_dp ) then ! Performed outside
      !  info = 0
      !  exit cg_iterations
      !end if

      if (info == 12) then
        r_dot_r = dot_product( wrk(:,3), wrk(:,3) )
        beta = r_dot_r/r_dot_r_old
        wrk(:,1) = beta * wrk(:,1) + 1._dp * wrk(:,3)
        itnum = itnum + 1
        ! Time to go back to Ap
        info = 22
      end if

    !end do cg_iterations

    return

  end subroutine stdlib_krylov_CG_reverse
  !!
  subroutine stdlib_krylov_CG( A, rhs, x, xzero, info )
    !!
    real(dp),         intent(in   )           :: A(:,:) 
    real(dp),         intent(in   )           :: rhs(:)
    real(dp),         intent(inout)           :: x(:)
    logical,          intent(in   ), optional :: xzero
    integer,          intent(  out)           :: info
    !!
    !! Default options
    real(dp), allocatable :: r(:)
    real(dp), allocatable :: p(:)
    real(dp), allocatable :: Ap(:)
    !! Residual, search direction and other vectors.
    real(dp) :: alpha, beta
    real(dp) :: p_dot_Ap
    real(dp) :: r_dot_r
    real(dp) :: r_dot_r_old
    integer :: itnum
    !!
    allocate (Ap, source = rhs)
    Ap = 0._dp
    allocate (r, source = rhs)

    if ( present(xzero) .and. xzero ) then 
      Ap = matmul( A, x )
      r = 1._dp * r + (-1._dp) * Ap 
    else
      x = 0._dp
    end if
 
    allocate (p, source = r)
    
    r_dot_r = dot_product( r, r )  
    info = 1
    
    cg_iterations: do itnum = 1, 10
      
      Ap = matmul( A, p )
      p_dot_Ap = dot_product( p, Ap )
      alpha = r_dot_r / p_dot_Ap
      x = x + (alpha) * p 
      r = r + (-alpha) * Ap 
      r_dot_r_old = r_dot_r
      r_dot_r = dot_product( r, r )

      if ( sqrt(r_dot_r) < 1.e-10 ) then
        info = 0
        exit cg_iterations
      end if

      beta = r_dot_r/r_dot_r_old
      p = beta * p + 1._dp * r 

    end do cg_iterations

    deallocate (r, p, Ap)

    return

  end subroutine stdlib_krylov_CG
  !!
end module stdlib_krylov_PCG

program main
  use stdlib_kinds
  use stdlib_krylov_PCG
  implicit none

  integer :: i,j, n
  real(dp), allocatable :: A(:,:)
  real(dp), allocatable :: L(:,:)
  real(dp), allocatable :: U(:,:)
  real(dp), allocatable :: b(:), x(:)
  real(dp), allocatable :: work(:,:), ones(:)
  integer :: info
  integer :: itnum

  n = 10
  allocate( A(n,n), L(n,n), U(n,n), b(n), ones(n), x(n), work(n,3) )
  L = 0._dp
  do j = 1, n
    do i = j, n
      call random_number(L(i,j))
    end do
    L(j,j) = 3._dp + 3._dp*L(j,j)! sqrt(real(n))
  end do
  U = transpose(L)
  A = matmul( L, transpose(L) )
  ones = 1._dp
  b = matmul( A, ones)

  call stdlib_linalg_cg(A, b, x, .false., info )
  print *, x
  
  x = 0._dp
  ! This in a subroutine init
  work(:,1) = b
  work(:,2) = 0._dp
  work(:,3) = b
  info = 22

  cg_iteration: do i = 1, 3*10
  
    call stdlib_linalg_cg( n, work, x, info, itnum )
    if (info == 22) then 
       work(:,2) = matmul( A, work(:,1) )
       info = 11
    end if

    if (info == 21) then 
       if ( sqrt(dot_product(work(:,3),work(:,3))) < 1.e-10 ) then
         info = 0
         exit cg_iteration
       else 
         info = 12
       end if
    end if

  end do cg_iteration
 
  print *, x

end program

I see the value in this kind of implementation (there is the standard cg there for reference), but I don’t see any benefit in terms of code simplicity to be honest. I will test performance though.

ivanpribec · March 3, 2025, 11:16pm

I think your reverse communication version is almost there. Usually you don’t need to modify the state variable info (often named ido), unless there is really a need for a bi-directional communication protocol. I guess with today’s flow-control constructs it would be something like this:

info = -1  ! set by init routine

revcom: do
    call stdlib_linalg_cg( n, work, x, info, itnum )
    select case(info)
    case(0)
        exit revcom
    case(22)
       work(:,2) = matmul( A, work(:,1) )
    case(21)
       if ( sqrt(dot_product(work(:,3),work(:,3))) < 1.0e-10 ) exit revcom
    case default
        write(error_unit,'(A,I0)') "error in cg revcom, info = ", info
        error stop 1
    end select
end do revcom

The simplicity is there is no inheritance (polymorphism), no procedure pointers, just very simple flow constructs and direct calls the CPU easily churns through. All workspace area needed must be allocated beforehand by the caller, so there is no overhead at runtime. The CG routine remains agnostic of the form (and storage) of the linear operator A, so callers can build any interface they like on top.

In principle you can build the direct interface on top of the reverse version (but not the other way around):

! Direct version, on top of REVCOM version
subroutine stdlib_krylov_CG( A, rhs, x, xzero, info )
    real(dp),         intent(in   )           :: A(:,:) 
    real(dp),         intent(in   )           :: rhs(:)
    real(dp),         intent(inout)           :: x(:)
    logical,          intent(in   ), optional :: xzero
    integer,          intent(  out)           :: info

    integer :: n, info, itnumw
    real(dp), allocatable :: wrk(:,:)

    n = size(x)
    allocate(wrk(n,4))

    ! ... init
 
    revcom: do 
       call stdlib_krylov_CG_reverse( n, wrk, x, info, itnum )
       select case(info)
          ! ...
       end select
    end do revcom
end subroutine

I think today it’s better to build on top of type extension or procedure arguments, because the flow of the algorithm remains clean that way.

ivanpribec · March 4, 2025, 12:45am

Reverse communication does incur some overhead, because the main procedure is repeatedly re-entered versus being called just once. I’d guess it is still negligible under most circumstances.

For the MKL RCI ISS conjugate gradient solver, I have a main loop like this:

        revcom: do
            call dcg(n,x,b,rci_request,ipar,dpar,tmp)
            if (rci_request == 0) then
                info = 0
                return
            end if
            select case(rci_request)
            case(1)
                ! matrix-vector product
                call cache%matvec(n,tmp(:,1),tmp(:,2))
            case(3)
                ! apply preconditioner
                call cache%psolve(n,tmp(:,3),tmp(:,4))
            case default
                error stop "in iterate loop, unknown request"
            end select
        end do revcom

The cache is an instance of the class:

type, abstract :: rci_iss_cache
    integer :: n
    integer :: ipar(128)
    real(dp) :: dpar(128)
    real(dp), allocatable :: tmp(:)
    integer :: nrestart = 0
    procedure(matvec_fun), pointer :: psolve => null()
contains
    procedure(matvec_fun), deferred :: matvec
end type

abstract interface
    subroutine matvec_fun(cache,x,y)
        import rci_iss_cache, dp
        class(rci_iss_cache) :: cache
        real(dp), intent(in) :: x(cache%n)
        real(dp), intent(out) :: y(cache%n)
    end subroutine
end interface

If I look at the control flow graph in Compiler Explorer (Compiler Explorer), the revcom loop becomes the following series of blocks:

The compiler replaced select case with if-elseif. Notice how the procedures matvec and psolve are indirect procedure calls (we have to dereference the pointer / function table, before calling the procedure).

With a direct (class-based) approach you’d be pushing the apply and psolve actions into the body of the solver procedure (they still have to happen in some place).

RonShepard · March 4, 2025, 3:30am

I think it basically doubles the overhead associated with the subroutine calls, but as you say, that would be neglible in most practical situations. I personally have always disliked the idea of reverse communication, but I’m not sure why. I think basically it places more burden on the programmer than the normal approach, which I guess would be called “forward communication.” It makes writing the main code more difficult to begin with, and then when it is used, it places some additional burden on the programmer to handle all of the possible return codes and to return the data correctly to the main code. There are lots of ways all of that can go wrong. In contrast, with the usual approach (subroutines passed through the argument list, or subprogram pointers, etc.), the language helps with the process. Also, with recursive nested optimizations, the reverse communication approach is even more difficult to keep straight, whereas the forward approach seems to be just a little more difficult for the programmer.

loiseaujc · March 4, 2025, 5:59am

I have pretty much the same feeling about reverse communication as @RonShepard. Two other issues I have with it are:

Suppose the algorithm you implement need to have some notion of internal state between consecutive iterations (think for instance of the Krylov basis and the Hessenberg matrix for the Arnoldi factorization which is one of the algorithm I’m most familiar with). As far as I know, using a reverse communication approach would entail that you declare some variables with the save attribute to keep their value in-between calls or use an external cache.
The second issue I have is if your computation is distributed. Using reverse communication put additional burden on the developer side to make sure to handle the mpi stuff correctly as well as possibly forcing some data structure to the user. The “simpler” approach then is to have two versions of the same code, one serial and one mpi-aware. This is basically the only difference between Arpack and PArpack.

Admittedly, I’m not an expert with reverse communication and these are only limitations that I’ve experienced with it. I have no doubt that it may be possible to overcome them, but I think that would require additional sphagetti code diverting from the main goal (i.e. number crunching).

kimala · March 4, 2025, 9:15am

Well, I see that there are very strong opinions about reverse communication, and my example was more to understand how it works (with a known example to me) than to suggest its adoption.

However, it could be offered in the suite, alongside the “forward communication” option, for those who prefer it. I will put it in the “maybe later” documents pile, but thank you all for sharing your doubts and feelings about it.

ivanpribec · March 4, 2025, 10:16am

These are all very valid critiques, namely

complicated calling
complicated implementation
internal state variables (complicates multi-threaded use)
distributed computation

I think one of the reasons reverse communication was used in the past is because it is independent of the language mechanism for passing procedure dummy arguments and it also allows use of variables in the caller context (nowadays this is possible using internal procedures). As long as you can figure out how the calling convention of a Fortran procedure (as in the old days before bind(c)) you could use the interface from any programming language or even an Excel spreadsheet. This is explained in an article from NAG: https://support.nag.com/doc/techrep/pdf/tr1_11.pdf

W.r.t to the Intel oneMKL RCI ISS, one more reason I can imagine is that there is a subset of Fortran in the System V ABI (https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf, section 9.2, pg 108). If you implement a library of external procedures, sticking with the basic argument types included in the ABI spec, no procedure dummy arguments, you can link MKL from different compilers. At least I think that’s why Intel usually distributes either plain interfaces in the form of include files or modules which only contain interface blocks. If they built their solver libraries based on type extension, deferred binding or procedure arguments, this wouldn’t be possible (unless they agreed to implement a compatible Fortran ABI).

Today I agree that “forward communication” is the best way to go. Cross-language calls are better achieved by C binding and ABI stability doesn’t matter if you are willing to distribute the source code.

Topic		Replies	Views
KART: a recommended design element toward Generics in Fortran Language enhancement	9	497	August 15, 2024
Generics : instantiate type from template? Help	11	607	June 17, 2023
Abstract class functions Help	3	414	June 2, 2022
Fortran Polymorphism Assignment Issue Help	4	382	January 10, 2024
Fortran's libraries: Picard Iteration and PCG method Help	5	586	April 10, 2023

Generic or Kind-Agnostic linear system solvers

API proposal

What’s under the hood

The actual solver

Code the algorithm once, scatter on all data types

Conclusions

Logging

Function Traceback string

Operational logging vs convergence logging

Timestamp ALL the operations

Conclusion

Related topics