Speed of autodif vs. finite differencing

I’m solving a system of about 10,000 stiff ODEs. Jacobian is quasi-banded with bandwidth of about 200.

Right now I calculate part of the jacobian with finite differences and I do part of it analytically.

But I want to couple additional ODEs. The new jacobian terms will take a really long time to compute with finite differences.

I’m wondering if autodif would be a lot faster for computing these new jacobian terms.

Thanks

That question is impossible to answer a priori. My intuition is, however, that at least the code will be cleaner with automatic differentiation, but that is partly wishful thinking and partly a matter of taste.

You might find some extra savings by applying a graph coloring approach to your sparse/banded Jacobian calculation using finite differences. See “Software for Estimating Sparse Jacobian Matrices” and citing articles. A newer reference is “What Color Is Your Jacobian?”.

Is your problem a discretization of a PDE e.g. on a 100 by 100 grid? Typically your stencil size will be much smaller than the bandwidth, so storing the Jacobian as a sparse matrix instead of a (block-diagonal) banded version might incur further savings in the Jacobian. Also for regular grids/stencils, the Jacobian will have a predictable structure.

3 Likes

Depends on how the Jacobian structure is exploited. When both using brute force(or in general, exploit the jacobian’s structure to a same degree) the theory says AD should be faster. But when the matrix structure is put consideration brute force AD may not be. Demo below.

Finite difference with banded jacobian.

module func
  use fazang
  implicit none

contains
  function f(x, xl, xr) result(fx)
    real(rk), intent(in) :: x, xl, xr
    real(rk) :: fx
    fx = x*x + xl*xl + xr*xr
  end function f

  function jac(x, n) result(a)
    integer, intent(in) :: n
    real(rk), intent(in) :: x(n)
    real(rk) :: a(n, n)
    integer :: i, j
    real(rk), parameter :: h = 0.01d0
    a = 0.d0
    ! neglect the boundaries.
    do i = 2, n-1
       a(i, i-1) = (f(x(i), x(i-1)+h, x(i+1)) - f(x(i), x(i-1)-h, x(i+1)))/(2*h)
       a(i, i) = (f(x(i)+h, x(i-1), x(i+1)) - f(x(i)-h, x(i-1), x(i+1)))/(2*h)
       a(i, i+1) = (f(x(i), x(i-1), x(i+1)+h) - f(x(i), x(i-1), x(i+1)+h))/(2*h)
    end do
  end function jac

end module func

program jac_test
  use fazang
  use func

  implicit none
  
  integer, parameter :: n = 1000
  integer i
  real(rk) :: fx(n, n), x(n)
  do i = 1, n
     x(i) = 1.5d0 * i
  end do

  fx = jac(x, n)

end program jac_test

which takes

1/1 jacobian_fd        OK              0.01s

Now brute force AD using the library I’m working on.

module func
  use fazang
  implicit none

contains
  function f(x, n) result(fx)
    integer, intent(in) :: n
    type(var), intent(in) :: x(:)
    type(var) :: fx(n)
    integer :: i
    fx(1) = x(1)*x(1) + x(2)*x(2)
    fx(n) = x(n)*x(n) - x(n-1)*x(n-1)
    do i = 2, n-1
       fx(i) = x(i)*x(i) + x(i-1)*x(i-1) + x(i+1)*x(i+1)
    end do
  end function f

end module func

program jac_test
  use fazang
  use func

  implicit none
  
  integer, parameter :: n = 1000
  integer i
  real(rk) :: fx(n, n+1), x(n)
  do i = 1, n
     x(i) = 1.5d0 * i
  end do

  fx = jacobian(f, n, x)

end program jac_test

which takes

1/1 jacobian_ad        OK              3.31s

The cause of performance gap is that in AD the jacobian is taken w.r.t. every x entry, without considering the sparsity.

2 Likes

@yizhang Excited about your library. I’ve played with it some.

Is it possible to use fazang to compute the band jacobian terms only, and exploit the matrix structure? In other words, is it possible to re-write the finite-difference jacobian function you gave in your example, but using AD instead.

For \frac{\partial f_i}{\partial x_j}, one can manually loop i and call the gradient function, something like (assuming each f_i depends on three non-zero x_i's at i-1, i, i+1).

real(kind=8) :: x(10000)
real(kind=8) :: fx(4)
!...
do i = 2, n-1 
  fx = gradient(f, x((i-1):(i+1))
  ! fx(1) is f_i(x), i.e., function evaluation  
  ! fx(2:4) is the jacobian(i, (i-1):(i+1))
enddo

See fazang/grad_test.F90 at main · yizhang-yiz/fazang · GitHub for the use of gradient.

1 Like

@yizhang, will the manual AD approach using fazang be faster or about the same speed as finite differences?

That’s equivalent to compare gradient vs a finite difference grad evaluation of a f: \mathbb{R}^n \rightarrow \mathbb{R} function. It depends on things like n, FLOPS per each finite difference evaluation, and the complexity of f. I can only say when n is large enough AD would be faster.

Demo.

module func
  use fazang
  implicit none

contains
  type(var) function f(x)
    type(var), intent(in) :: x(:)
    f = sum(x)
  end function f

  real(rk) function fd(x)
    real(rk), intent(in) :: x(:)
    fd = sum(x)
  end function fd

  function jac(x) result(res)
    real(rk), intent(in) :: x(:)
    real(rk) :: res(size(x)), xl(size(x)), xr(size(x))
    integer i
    real(rk), parameter :: h = 0.01d0
    xl = x
    xr = x
    do i = 2, size(x)-1
       xl(i-1) = xl(i-1) - h
       xr(i-1) = xr(i-1) + h
       res(i) = 0.5d0 * (fd(xr) - fd(xl))/h
       xl(i-1) = x(i-1)
       xr(i-1) = x(i-1)
    end do
  end function jac
end module func

program grad_test
  use fazang
  use func

  implicit none
  
  integer, parameter :: n = 10000
  integer i
  real(rk) :: fx(n+1), x(n)

  do i = 1, n
     x(i) = 1.5d0 * i
  end do

  fx = gradient(f, x)

end program grad_test

so this gives AD elapsed time

1/1 jacobian_ad        OK              0.01s

Replacing

  fx = gradient(f, x)

with

  fx(1:n) = jac(x)

gives us the finite difference time

1/1 jacobian_ad        OK              0.24s

Decreasing n will eventually bring the performance to same.

1 Like

Thanks. This is very helpful. I’m going to try to use fazang instead of finite differencing some of my jacobian in my photochemical model. I think it will be faster, because I think my n often between 10,000 and 20,000. Some of my jacobian terms will computed analytically.

My NumDiff project may be of interest: jacobwilliams/NumDiff: Modern Fortran Numerical Differentiation Library (github.com)

It supports sparsity for finite diff jacobian computation. I’m wondering if maybe it could be extended to also have an autodiff option…

3 Likes

@yizhang to compute banded jacobian with finite differences people often use the curtis-powell-reid algorithm. If you have a function \mathbb{R^n} \rightarrow \mathbb{R^n}, which has a jacobian of bandwidth m, then you only need to do m function evaluations to compute the banded jacobian with forward finite differences.

Is it possible to do an analogous thing with autodif?

Good point. That’s actually directly related to the linked paper in @ivanpribec 's post and in general sparsity coloring techniques. I was planning it for the future but guess I’ll need to bump it to the top. I’ll see what I can do during the wkend.

1 Like

@yizhang I found this discussion, which makes it sound like only m autodiff passes (bandwidth) are required to compute a banded jacobian.

Note: the NumDiff lib includes a modernized version of the graph coloring algorithm.

1 Like

Where can I find it? Thanks

This is true. In this case the structural orthogonal set i can be obtained by grouping the i th non-zero entry of each row.

@yizhang how would this grouping be done in practice?

My understanding is that forward mode autodif should be more efficient for computing “tall” jacobians because you get one column for each foward pass. A banded jacobian is kind of like a tall jacobian. So is forward mode autodif required for computing banded jacobians with just m forward passes (m = bandwidth)?

You are right that forward mode is more efficient when treating “tall” cases. But this doesn’t mean reverse mode cannot take advantage of algorithms like CPR. Imagine jacobian {\partial f_i/\partial x_j} with structure

J=\begin{bmatrix} f_{1,1}, &f_{1,2},& 0,& 0,& 0\\ 0, &f_{2,2}, &f_{2,3}, &0, &0\\ 0, &0 &f_{3,3}, &f_{3,4}, &0\\ 0, &0 &0, &f_{4,4}, &f_{4,5}\\ 0, &0 &0, &0, &f_{5,5}\\ \end{bmatrix}.

It’s equivalent to calculate

J_r=\begin{bmatrix} f_{1,1}, &f_{1,2},& f_{3,3},& f_{3,4},& f_{5,5}\\ 0, &f_{2,2}, &f_{2,3}, &f_{4,4}, &f_{4,5} \end{bmatrix}.

so instead of run a reverse AD pass for each f_i (corresponding to i th row of J), we can just run two reverse passes, one for each row of J_r: the first row of J_r is the gradient of f_1 + f_3 + f_5, and the second row the gradient of f_2 + f_4. Thus we have reduced the number of AD passes from 5 to 2. Modern coloring algorithms are basically seeking the grouping like above (structurally orthogonal sets).

I’m having a hectic week so allow me to point to https://dl.acm.org/doi/abs/10.1145/1271.319415 for implementations.

2 Likes