Fun things to add: (1) Despite being classified as separate things in TFA, symbo...

Fun things to add:

(1) Despite being classified as separate things in TFA, symbolic differentiation really is the same thing as AD. The apparent explosion in terms comes not from some innate difference but because many symbolic differentiators don't allow you to express term reuse.

(2) It's easy to write locally (in code) non-differentiable functions that compose to something smooth, and AD usually has a bad time with that. Anything involving a branch on the symbols to be differentiated has a decent chance of blowing up this way. Consider x==0 ? x^2+x : x^2. A typical AD framework will either refuse to produce a derivative or will derive the first term and incorrectly produce 1 instead of 0. This sort of issue makes it challenging to derive variable-length algorithms even if they're known to have smooth results (like almost anything with a stable optimization subroutine).