Automatic Differentiation Built Into LFortran

@certik, @yizhang and everyone_else_interested:
I think it would be of immense benefit and forward-thinking to have automatic differentiation built into the LFortran compiler as a first class citizen.

Based on:

    1. its usefulness/utility in the present/future and even past
    1. trend in (array) programming languages:
    • Julia
    • Dex
    • Futhark
    • Jax - Python library
    • a C++ proposal - not yet ratified
    • etc.
    1. not playing catch-up again in the future as we currently are in Fortran to other languages.
10 Likes

Here is an issue for this feature:

Feel free to add your comments there.

If anyone is interested in working on this, please let me know.

4 Likes

For LLVM-based compilers (such as LFortran), it should be possible in principle to use Enzyme AD which performs the differentiation on the level of the LLVM IR.

There are issues open on AD at both the stdlib and J3 Fortran proposals repositories:

The C++ proposal can be found here.

Leaving all implementation details aside, the part interesting for the Fortran community is how to embed AD within our source code. Here’s a non-exhaustive list of approaches:

  1. a directive based formalism that co-exists independent of the base language
  2. by introducing new commands (differentiate block construct + jacobian/hessian commands to retrieve the derivatives)
  3. a (parameterized) derived type which supports AD
  4. as a new (dummy) variable attribute
6 Likes

Thanks @certik, I will go through that.

2 Likes

I am so glad it is also under J3 consideration. Thanks for the link.

2 Likes

I think it’s probably wildly optimistic to expect that the Fortran Committee would ever add anything like this to the language. Getting it into a compiler (i.e. LFortran) is likely the only hope we have. I agree, Fortran needs this. Differentiation is such a huge component in so many areas of scientific and technical computing, and Fortran has fallen behind in this as in so much else.

4 Likes

Building AD adjoints for blas/lapack routines is the most needed but also most daunting task on this roadmap.The problem with compiler level AD is the lack of knowledge of non-scalar structure that is ubiquitous in numerical computing, as it is not able to see matrix/jacobian structures in dgtsv, dgbsv, dsysv etc. So even we support Enzyme out of box we’ll need augment it significantly to make it usable/scalable for everyday Fortran applications.

2 Likes

You mean, something like this?

Yes. That’s not something that can be done efficiently by simply using compiler-level AD constructs.

For BLAS the adjoint routines have already appeared: Algorithm 1005: Fortran Subroutines for Reverse Mode Algorithmic Differentiation of BLAS Matrix Operations: ACM Transactions on Mathematical Software: Vol 46, No 1

For other matrix problems, this document may be of help. (Thanks again to @lkedward for posting it in a previous discussion.)

I’m far from being a compiler developer, but a behavior that I couldn’t find a way to mimic on Fortran is the possibility of obtaining recursive derivatives. With a simple example:

f(x, y) = 2 x^2 \cdot y \\ ~ \\ g(x, y) = \left(\frac{df}{dx}(x, y)\right)^2 \cdot \frac{x}{y}

And I want to obtain \frac{dg}{dx} automatically (without using the analytical \frac{df}{dx}). With usual forward diff libraries this is not possible since they’re usually written like

type :: DualNumber
     real :: value
     real :: dx
end type

So when I extract dfdx with something like f%dx inside g(x, y), I lose track of the derivatives and can’t calculate dgdx. In theory I could use recursive types to avoid that (and also get higher-order derivatives easily):

type :: DualNumber
     real :: value
     type(DualNumber), allocatable :: dx
end type

I have tried making this in a simple module just for testing, but I got horrible performances (both in memory and CPU time).

Meanwhile, in Julia I can do this:

using ForwardDiff

function f(x, y)
    return 2*x^2 * y
end

dfdx(x, y) = ForwardDiff.derivative(x -> f(x, y), x) 

function g(x, y)
    return dfdx(x, y)^2 * x/y
end

dgdx(x, y) = ForwardDiff.derivative(x -> g(x, y), x) 

And everything works fine (and fast). I think some simple interface like this for the end user would be astonishing!

I see that clad (GitHub - vgvassilev/clad: clad -- automatic differentiation for C/C++) does source-to-source translation using the function’s AST, so maybe this could be possible with LFortran as well?

I had the idea of implementing some compile-time source-to-source translator, but unfortunately this exceeds my current knowledge

Personally I favor Jax style auto-diff that uses a boxed type to compute the graph and derivatives based on the function definition, at runtime. (Jax has an extra compilation step i belive, but the point is it’s easy to write arbitrary code as in you example and get the derivative. The Jax predecessor autograd is a good resource: GitHub - HIPS/autograd: Efficiently computes derivatives of numpy code.

I was working on a C version but abandoned it, at some point I’d be interested in trying the same thing in Fortran.

I’m not aware of how jax implements autodiff but isn’t runtime autodiff (like using dual numbers) always end in slower code compared to source-to-source, where the compiler can optimize the specific functions?

Good question, I think that’s why Jax has an extra compilation step where the graph derived from the program is optimized. I don’t see why that can’t happen at runtime, though i can see implementing the optimization potentially being a lot more work than just letting the compiler handle it.

Oh, didn’t know about that extra step! Sounds smart.

For my specific problem unfortunately it was unusable :(. jit-ing was more painful than using Julia, and I couldn’t achieve a decent performance

I’m sorry, but I am totally opposed to this. It is a bad idea if we are talking about analytical differentiation and a very-very bad idea if it’s about numerical differentiation.

  • Analytic computation of derivatives is a job for Computer Algebra Systems (CAS), and they are very good at it. I see no reason to bloat a programming language to a semi-CAS - which will never be as good as the many CAS already existing for decades anyway.
    I personally use Maxima (after ditching Mathematica years ago) for several reasons. Maxima (and pretty much all CAS) even have a command to “export” the resulting derivatives in F90 form, which you literally copy and paste in your Fortran program. Where is the need to bloat Fortran itself with that feature?
  • Numerical differentiation is not easy. In fact it is usually much harder than integration. But let’s say it is not. Which method will be used? Finite differences? Why not using a spline - if the function is smooth enough, it’s a better method. But cubic splines will also require tridiagonal Gaussian elimination, right? And while we are at it, why not adding a ship load of other “LAPACK-ish” methods as well? But then again, isn’t all that what the countless numerical analysis packages already available for ages now are for?
    I see no reason to give the novice programmer yet another way to avoid learning anything and just use the built-in differentiation, integration, whatever else the language itself provides. I can tell from experience the vast majority of students will stick to this method (because it’s easier) without having the slightest idea what’s the pros and cons of using that specific method, when that built-in method is to be used and when it’s not a good idea using it, or even what’s its name.

Years ago, I released a Numerical Analysis package for a specific machine. The package offered several ready-made numerical methods. The shooting method for solving boundary-value problems was quite popular in the small community of users. But of course you had to supply a Jacobian, and I did the mistake to add an “auto” option which could be used to skip the analytic Jacobian by using a (finite differences) numerical approximation instead. I did add a note - with big fat letters in the manual - saying something like “always supply the analytic form of the Jacobian, its automatic numerical approximation is not recommended unless you know what you are doing”. I assumed this will prevent users from using it unless in special cases. Guess what, none ever bothered computing the Jacobian. Literally none ever asked anything about it, they just used the built-in “automatic” option, essentially numerically approximating the Jacobian, and I doubt most of them even knew what method was used in the package for said approximation. They even released their own packages solving specific problems using that numerical approximation. Since then, I learned my lesson, never give the user too much “convenience”, because they will use it. Every time, all the time.

1 Like

We are talking about neither of the two. Automatic differentiation is a distinct concept. Roughly speaking, any computer program is just a series of arithmetic operations, to which you can recursively apply the chain rule to obtain a derivative.

3 Likes

I recently played around with implementing an AD package based on parameterized types. Most compilers would compile the module(s) containing the AD software. However, when I tried to build and run an actual test program, the only compiler that worked was the most recent (and I guess last) oneAPI 2023 release (both ifort and ifx). All other compilers (and oneAPI up to the last 2023 release) all failed to compile and/or run the test program due to a variety of errors (seg faults etc). I still don’t understand why the compiler developer community can’t get parameterized types to work 100% of the time. I doubt it will ever work as planned (like FORALL). For that reason alone, I would love to see AD built into the language.

1 Like

Well, it seems I misunderstood. I am quite busy to go into details right now, it sounds interesting at first glace, but I’m still not sure it is indeed distinct. Afterall, it is a method to compute derivatives. However sophisticated and interesting it might be, my general idea still applies: Why this has to be part of a programming language? Do not bloat a language with features that can be found elsewhere, or implemented as a library. There are many ways to touch your left ear, but the simplest way is to just raise your left hand a bit…

I see why you cannot easily use an usual AD library. Although, you can expand dg as an expression of df and d2f and use an AD library with 2d order derivatives/hyperdual numbers, but I’m guessing it’s not what you want and the generalization can be tricky.