Diffstruc: A Fortran library for automatic differentiation

Hi all,

I have just released diffstruc, an open source Fortran library for performing automatic differentiation (AD) using forward- and reverse-mode methods, enabling any order differentiation (but, as is common with AD, this is usually limited to 3rd order for any reasonable task due to memory limitations).

The library utilises derived types and pointers to build the necessary operation graphs to perform differentiation. I’ve spent some time trying to fix all memory errors and leaks, but I’ve probably missed some, so keep an eye out and let me know. The library works well, but isn’t nearly as fast as things like Pytorch, so if anyone wants to contribute to make this faster, feel free to get in touch or make pull requests. I am constantly working on this, so I expect to be tinkering away and trying to reduce memory usage further in the coming weeks/months.

The library has core maths, linear algebra, trigonometric, exponent, broadcasting, and reduction operations coded in, but the library is built with the principle of extendability (or whatever you want to call it), so adding new operations that can handle AD does not require modifying the source code, but instead overloading the operators of the derived type. I have been building this library to enable physics informed neural networks in athena (my Fortran neural network library), so functionality and design has come from needing to operate with that.

For more details, I have written up a ReadtheDocs to document some use-cases and a tutorial on how to add custom operators.

I hope someone else finds this useful in some way.

Feel free to ask any questions or share any suggestions!

29 Likes

Really interesting! I was interested in making a Fortran implementation of backward differentiation but (fortunately for me) you beat me to it. Looking at the documentation tutorials I see that the functions are defined with a pointer, for example:

f => 2.*x**2

How could I work with a more complex function? One that require loops or some kind of conditional. For example

f = 0

do i=1,N
    do j=1,N
       if (n(i) /= 1.0) then
            f = f  + n(i) * b(i) * b(j)
        end if
    end do
end do

Just as and example, I work with functions with a lot of different terms and parameters that are also functions of the variables. Also, is it possible to make arithmetic between functions?
Like in:

f => x**2 + 2
g => dot_product(x, y)

h => f + g
1 Like

Thanks @fedebenelli. :slightly_smiling_face: Hopefully the library does what you want so that you don’t have to implement autodiff yourself (or, at least, provide you with a useful starting point).

Yes, your example should be implementable. The form will be something like this (but probably not exactly, it’s the weekend and my mind isn’t working on doing maths now):

f => sum( matmul( mask( n, n .ne. 1.0 )  matmul( b, b ) ) )

If it can’t be implemented that easily (or if you want to optimise for memory), I’d recommend referring to the custom operation making documentation to implement it as a specific function (Supported Operations — diffstruc documentation).

For the arithmetic function, you can totally do exactly what you’ve written (except I haven’t set up dot_product, I should probably do that one soon, so you’d need to use sum(x*y)).

The list of operations will definitely need to be expanded further over time, but a good number are already implemented.

3 Likes

Very cool!

I have a small question about the forward mode in calculating partial derivatives.

I have used fortran’s dnad module and computed partial derivatives via dual number (which is a typical implementation of the forward mode in AD).

The advantage of using dual number is that, say my function is

f(x_1,x_2,x_3,x_4,x_5)

Once I promote x_1 to x_5 to dual numbers (so all the computations inside f become dual number computations, which are essentially array operations and compilers may use vectorization to optimize the computations), then I just need to evaluate the function f(x_1,x_2,x_3,x_4,x_5) once, and I got the function value itself as well as all the 5 partial derivatives i.e., \partial f / \partial x_1 to \partial f / \partial x_5 simultaneously.

In your implementation, does it need to evaluate f(x_1,x_2,x_3,x_4,x_5) fives times to get \partial f / \partial x_1 to \partial f / \partial x_5?
Many thanks in advance!

PS.
Usually people say that, as you mentioned in your help documentation, in forward mode, the computation cost (or disadvantage) is:

  • Cost: One forward pass per input variable

However, you see, if we use dual numbers, it is actually just one forward pass (just compute f once) and we get all the derivatives for each input variable x_1,x_2,x_3,x_4,x_5. We do not really do 5 forward passes for the 5 variables.
Of course one may say that, when using dual numbers, even if it looks like you just compute the dual number version of f(x_1,x_2,x_3,x_4,x_5) once, you are implicitly doing 5 forward passes. But it is not really so. Because in dual number computations, the dual number computations are essentially vectorized, so the cost is definitely less than doing 5 independent forward passes.

1 Like

Thanks @CRquantum. :slight_smile:

diffstruc can achieve what you are asking using reverse mode differentiation (i.e. partial derivatives of one output with respect to multiple inputs). See this page of the documentation (but feel free to ask more questions if the documentation isn’t clear): Reverse-Mode Differentiation (Backpropagation) — diffstruc documentation

Optional further discussion provided below.

I believe that diffstruc uses dual numbers for its automatic differentiation, the type is just called array_type instead (I can’t say for certain if my implementation is exactly dual numbers as I haven’t delved into the literature enough to confidently say one way or another).

From my understanding, forward mode automatic differentiation can only ever produce the partial derivatives of multiple outputs with respect to one input in a single call, i.e. a separate calculation for df/dx1 to df/dx5. Reverse mode on the other hand calculates the partial derivatives of one output with respect to multiple inputs in one call, i.e. df/dx1 to df/dx5 all in one go. Diffstruc offers implementations of both forward and reverse mode automatic differentiation. Forward mode can more easily be used to calculate higher order partial derivatives, whilst it is much more tricky to get higher order derivatives using reverse mode (and isn’t currently possible in diffstruc).

2 Likes

Thank you!

I have used the dnad module to calculate things like the \partial f / \partial x_1 to \partial f / \partial x_5. Looks like just one evaluation of f(x_1,x_2,x_3,x_4,x_5) can give all the 5 partial derivatives from \partial f / \partial x_1 to \partial f / \partial x_5 simultaneously. There is no need to evaluate f(x_1,x_2,x_3,x_4,x_5) fives times, to get \partial f / \partial x_1 to \partial f / \partial x_5 one by one.

For example you see in the dnad module, the dual part of the dual number is dx, it is 1d array with ndv elements, ndv can be more than 1.

    type,public:: dual  ! make this private will create difficulty to use the
                        ! original write/read commands, hence x and dx are
                        ! variables which can be accessed using D%x and D%dx in
                        ! other units using this module in which D is defined
                        ! as type(dual).
        sequence
        real :: x  ! functional value
        real :: dx(ndv)  ! derivative
    end type dual

Like, in f(x_1,x_2,x_3,x_4,x_5), say, x_1,x_2,x_3,x_4,x_5 are 100,200,300,400,500. You set ndv as 5, and promote x_i to dual number type (of course, you would define the dual number computation rules, which are basically those defined in the dnad module) so that

x_1%x=100
x_1%dx(1:5)=[1,0,0,0,0]
x_2%x=200
x_2%dx(1:5)=[0,1,0,0,0]
x_3%x=300
x_3%dx(1:5)=[0,0,1,0,0]
x_4%x=400
x_4%dx(1:5)=[0,0,0,1,0]
x_5%x=500
x_5%dx(1:5)=[0,0,0,0,1]

You only need to evaluate the dual number version of f(x_1,x_2,x_3,x_4,x_5) just once, then f%x is the value of f(x_1,x_2,x_3,x_4,x_5) itself, and f%dx is a 5-element array whose elements correspond to \partial f / \partial x_1 to \partial f / \partial x_5.
So, no need to evaluate f(x_1,x_2,x_3,x_4,x_5) fives times, to get \partial f / \partial x_1 to \partial f / \partial x_5 one by one.

3 Likes

Thanks for the details. Yeah, diffstruc does not use dual numbers and does not calculate derivatives at the same time as the function value itself. However, you can achieve similar results through a two-step process using reverse mode. In the reverse mode, you calculate the function value and then call reverse mode on the function value and retrieve the derivatives of the output with respect to all inputs.

f => x1 + x2 + …
call f%grad_reverse()

! access derivatives using x1%grad and x2%grad

Difference here is that the gradients are stored in each input rather than in the output. diffstruc is developed primarily for neural network learning, which has requires calculating the partial derivatives for functions with many inputs and few outputs; this is something dual numbers are comparatively inefficient for, from my understanding.

2 Likes