Many compsci people have been captivated by it and wrote introductions, trying to put the technique into a wider perspective. Here is mine, including a "poor man's variant" of automatic differentiation which does without operator overloading, but uses complex numbers instead:
At the time I worked in Machine Learning (94-95) I was unaware of AD and my professor who built the objective function also determined its analytic derivative manually. I didn't learn about it until a few years ago and was amazed because I spend much of the late 90's learning enough Mathematica to make my own analytic derivatives.
I think this goes back to "The complex-step derivative approximation" from 2003 by J. Martins, P. Sturdza and J. Alonso. [0] That paper is a great read!
Autodiff computes a derivative by examining a computational graph (either up-front all at once, or implicitly by examining each computation) and producing a new graph. The person defines the forward pass (graph), and the computer figures out the backward pass.
Backprop is what happens when you tell the programmer to do the thing autodiff is doing. You examine the computational graph, write down all the local changes that autodiff would do to compute the derivative, and that new code (that you hand-wrote rather than letting a machine generate) is a function computing the derivative by backpropagating error terms through each edge in that computational graph.
Many compsci people have been captivated by it and wrote introductions, trying to put the technique into a wider perspective. Here is mine, including a "poor man's variant" of automatic differentiation which does without operator overloading, but uses complex numbers instead:
https://pizzaseminar.speicherleck.de/automatic-differentiati...