I've found algorithmic (I prefer this to "automatic") differentiation strangely ...

derbOac · on May 11, 2022

One of the best explanations for me — maybe one of the first I encountered — was a paper posted here on HN about AD applied to abstract types. Getting away from numeric derivatives forced me to think about them differently.

Incidentally, this is also one area where prior experience with lisp can be helpful.

omnicognate · on May 12, 2022

Oops, realised I got my footnote [2] the wrong way round here, and it's too late to edit it. In forward mode we need one evaluation for each input to get all the sensitivities. In reverse/adjoint mode it's one per output. In real world situations we're often performing calculations on a large set of input numbers to produce a small set of output numbers we're interested in, so there may be orders of magnitude fewer outputs than inputs.

jhgb · on May 11, 2022

> I've found algorithmic (I prefer this to "automatic") differentiation

Isn't every differentiation algorithmic? It's typically integration where you often use intuition in substitution, unless you're using something like Risch's algorithm, but one learns to differentiate in school from the very beginning with what can only be described as an algorithm: at every step, there's a clear procedure to decide on what to do next, and when to terminate.

omnicognate · on May 11, 2022

There are 3 ways of doing differentiation in a computer: numeric, X and symbolic, where X is the technique we're discussing. Personally I don't think "automatic" really captures what's different about this technique. It's no more "automatic" than the other options.

"Algorithmic" differentiation is intended to evoke the idea that we are "differentiating the algorithm" piece by piece, although it's certainly true that it can be misunderstood as saying that the differentiation process itself is an algorithm, which is again no different to the other techniques.

It's a shame the terminology in this area is so fragmented. I assume it's because the technique has been rediscovered many times. In machine learning, for example, what I would call "adjoint algorithmic differentiation" is called "backpropagation".

jhgb · on May 11, 2022

Ah, I see what you mean, but if you contrast "automatic differentiation" and "algorithmic differentiation" with each other, it's counterintuitive to me that these terms should be "oriented" differently. Either I imagine that the differentiation is automated or algorithmized, or I could imagine that it's an automaton or algorithm being differentiated. But it's confusing to me if it's algorithm being differentiated vs. differentiation being automated. Not sure if that explains sufficiently clearly what confused me.

tomstuart · on May 11, 2022

Thank you! This is the clearest explanation I’ve read and it definitely shook something loose in my brain.

omnicognate · on May 11, 2022

Glad to hear it!