# Speed of autodif vs. finite differencing

I’m solving a system of about 10,000 stiff ODEs. Jacobian is quasi-banded with bandwidth of about 200.

Right now I calculate part of the jacobian with finite differences and I do part of it analytically.

But I want to couple additional ODEs. The new jacobian terms will take a really long time to compute with finite differences.

I’m wondering if autodif would be a lot faster for computing these new jacobian terms.

Thanks

That question is impossible to answer a priori. My intuition is, however, that at least the code will be cleaner with automatic differentiation, but that is partly wishful thinking and partly a matter of taste.

You might find some extra savings by applying a graph coloring approach to your sparse/banded Jacobian calculation using finite differences. See “Software for Estimating Sparse Jacobian Matrices” and citing articles. A newer reference is “What Color Is Your Jacobian?”.

Is your problem a discretization of a PDE e.g. on a 100 by 100 grid? Typically your stencil size will be much smaller than the bandwidth, so storing the Jacobian as a sparse matrix instead of a (block-diagonal) banded version might incur further savings in the Jacobian. Also for regular grids/stencils, the Jacobian will have a predictable structure.

3 Likes

Depends on how the Jacobian structure is exploited. When both using brute force(or in general, exploit the jacobian’s structure to a same degree) the theory says AD should be faster. But when the matrix structure is put consideration brute force AD may not be. Demo below.

Finite difference with banded jacobian.

module func
use fazang
implicit none

contains
function f(x, xl, xr) result(fx)
real(rk), intent(in) :: x, xl, xr
real(rk) :: fx
fx = x*x + xl*xl + xr*xr
end function f

function jac(x, n) result(a)
integer, intent(in) :: n
real(rk), intent(in) :: x(n)
real(rk) :: a(n, n)
integer :: i, j
real(rk), parameter :: h = 0.01d0
a = 0.d0
! neglect the boundaries.
do i = 2, n-1
a(i, i-1) = (f(x(i), x(i-1)+h, x(i+1)) - f(x(i), x(i-1)-h, x(i+1)))/(2*h)
a(i, i) = (f(x(i)+h, x(i-1), x(i+1)) - f(x(i)-h, x(i-1), x(i+1)))/(2*h)
a(i, i+1) = (f(x(i), x(i-1), x(i+1)+h) - f(x(i), x(i-1), x(i+1)+h))/(2*h)
end do
end function jac

end module func

program jac_test
use fazang
use func

implicit none

integer, parameter :: n = 1000
integer i
real(rk) :: fx(n, n), x(n)
do i = 1, n
x(i) = 1.5d0 * i
end do

fx = jac(x, n)

end program jac_test


which takes

1/1 jacobian_fd        OK              0.01s


Now brute force AD using the library I’m working on.

module func
use fazang
implicit none

contains
function f(x, n) result(fx)
integer, intent(in) :: n
type(var), intent(in) :: x(:)
type(var) :: fx(n)
integer :: i
fx(1) = x(1)*x(1) + x(2)*x(2)
fx(n) = x(n)*x(n) - x(n-1)*x(n-1)
do i = 2, n-1
fx(i) = x(i)*x(i) + x(i-1)*x(i-1) + x(i+1)*x(i+1)
end do
end function f

end module func

program jac_test
use fazang
use func

implicit none

integer, parameter :: n = 1000
integer i
real(rk) :: fx(n, n+1), x(n)
do i = 1, n
x(i) = 1.5d0 * i
end do

fx = jacobian(f, n, x)

end program jac_test


which takes

1/1 jacobian_ad        OK              3.31s


The cause of performance gap is that in AD the jacobian is taken w.r.t. every x entry, without considering the sparsity.

2 Likes

Is it possible to use fazang to compute the band jacobian terms only, and exploit the matrix structure? In other words, is it possible to re-write the finite-difference jacobian function you gave in your example, but using AD instead.

For \frac{\partial f_i}{\partial x_j}, one can manually loop i and call the gradient function, something like (assuming each f_i depends on three non-zero x_i's at i-1, i, i+1).

real(kind=8) :: x(10000)
real(kind=8) :: fx(4)
!...
do i = 2, n-1
! fx(1) is f_i(x), i.e., function evaluation
! fx(2:4) is the jacobian(i, (i-1):(i+1))
enddo


See fazang/grad_test.F90 at main · yizhang-yiz/fazang · GitHub for the use of gradient.

1 Like

@yizhang, will the manual AD approach using fazang be faster or about the same speed as finite differences?

That’s equivalent to compare gradient vs a finite difference grad evaluation of a f: \mathbb{R}^n \rightarrow \mathbb{R} function. It depends on things like n, FLOPS per each finite difference evaluation, and the complexity of f. I can only say when n is large enough AD would be faster.

Demo.

module func
use fazang
implicit none

contains
type(var) function f(x)
type(var), intent(in) :: x(:)
f = sum(x)
end function f

real(rk) function fd(x)
real(rk), intent(in) :: x(:)
fd = sum(x)
end function fd

function jac(x) result(res)
real(rk), intent(in) :: x(:)
real(rk) :: res(size(x)), xl(size(x)), xr(size(x))
integer i
real(rk), parameter :: h = 0.01d0
xl = x
xr = x
do i = 2, size(x)-1
xl(i-1) = xl(i-1) - h
xr(i-1) = xr(i-1) + h
res(i) = 0.5d0 * (fd(xr) - fd(xl))/h
xl(i-1) = x(i-1)
xr(i-1) = x(i-1)
end do
end function jac
end module func

use fazang
use func

implicit none

integer, parameter :: n = 10000
integer i
real(rk) :: fx(n+1), x(n)

do i = 1, n
x(i) = 1.5d0 * i
end do



so this gives AD elapsed time

1/1 jacobian_ad        OK              0.01s


Replacing

  fx = gradient(f, x)


with

  fx(1:n) = jac(x)


gives us the finite difference time

1/1 jacobian_ad        OK              0.24s


Decreasing n will eventually bring the performance to same.

1 Like

Thanks. This is very helpful. I’m going to try to use fazang instead of finite differencing some of my jacobian in my photochemical model. I think it will be faster, because I think my n often between 10,000 and 20,000. Some of my jacobian terms will computed analytically.

My NumDiff project may be of interest: jacobwilliams/NumDiff: Modern Fortran Numerical Differentiation Library (github.com)

It supports sparsity for finite diff jacobian computation. I’m wondering if maybe it could be extended to also have an autodiff option…

3 Likes

@yizhang to compute banded jacobian with finite differences people often use the curtis-powell-reid algorithm. If you have a function \mathbb{R^n} \rightarrow \mathbb{R^n}, which has a jacobian of bandwidth m, then you only need to do m function evaluations to compute the banded jacobian with forward finite differences.

Is it possible to do an analogous thing with autodif?

Good point. That’s actually directly related to the linked paper in @ivanpribec 's post and in general sparsity coloring techniques. I was planning it for the future but guess I’ll need to bump it to the top. I’ll see what I can do during the wkend.

1 Like

@yizhang I found this discussion, which makes it sound like only m autodiff passes (bandwidth) are required to compute a banded jacobian.

Note: the NumDiff lib includes a modernized version of the graph coloring algorithm.

1 Like

Where can I find it? Thanks

This is true. In this case the structural orthogonal set i can be obtained by grouping the i th non-zero entry of each row.

@yizhang how would this grouping be done in practice?

My understanding is that forward mode autodif should be more efficient for computing “tall” jacobians because you get one column for each foward pass. A banded jacobian is kind of like a tall jacobian. So is forward mode autodif required for computing banded jacobians with just m forward passes (m = bandwidth)?

You are right that forward mode is more efficient when treating “tall” cases. But this doesn’t mean reverse mode cannot take advantage of algorithms like CPR. Imagine jacobian {\partial f_i/\partial x_j} with structure

J=\begin{bmatrix} f_{1,1}, &f_{1,2},& 0,& 0,& 0\\ 0, &f_{2,2}, &f_{2,3}, &0, &0\\ 0, &0 &f_{3,3}, &f_{3,4}, &0\\ 0, &0 &0, &f_{4,4}, &f_{4,5}\\ 0, &0 &0, &0, &f_{5,5}\\ \end{bmatrix}.

It’s equivalent to calculate

J_r=\begin{bmatrix} f_{1,1}, &f_{1,2},& f_{3,3},& f_{3,4},& f_{5,5}\\ 0, &f_{2,2}, &f_{2,3}, &f_{4,4}, &f_{4,5} \end{bmatrix}.

so instead of run a reverse AD pass for each f_i (corresponding to i th row of J), we can just run two reverse passes, one for each row of J_r: the first row of J_r is the gradient of f_1 + f_3 + f_5, and the second row the gradient of f_2 + f_4. Thus we have reduced the number of AD passes from 5 to 2. Modern coloring algorithms are basically seeking the grouping like above (structurally orthogonal sets).

I’m having a hectic week so allow me to point to https://dl.acm.org/doi/abs/10.1145/1271.319415 for implementations.

2 Likes