Protip here: Do it by hand, though when you start getting tired of typing latex you can switch to IDE and let github copilot complete it(it will mostly be incorrect) and then you can go in and fix its mistakes, it still saves a bunch of time. For example:
```
The importtant thing is that the derivative needs to be computed from a number of elements
logits = h@W+b
= DL/Dlogit * W^T -------------> This is the final gradient and note that it is a matrix multiplication or a projection of the logit gradient on the Weight layer
[h12 h22] [DL/DLogit21, DL/DLogit22]]
[h13 h23]
]
= h^T @ DL/DLogit -------------> This is the final gradient and note that it is a matrix multiplication or a projection of the logit gradient on the hidden layer.
Nice blog. I'll be provocative/pedantic for no good reason and say that what's described isn't "calculus" per se, because you can't do calculus on discrete objects like a graph. However, you can define the derivative purely algebraically (as a linear operation which satisfies the Leibniz chain/product rule), which is more accurately what is being described.
You’re not doing calculus on a graph- you’re using a graph algorithm to automate the derivative taking process.
Essentially, you transform your function into a “circuit” or just a graph with edge labels according to the relationship between parts of the expression. The circuit has the nice property that there is an algorithm you can run on it, with very simple rules, which gets you the derivative of the function used to create that circuit.
So taking the derivative becomes:
1. Transform function F into circuit C.
2. Run compute_gradiant(c) to get the gradient of F.
If we're being pedantic, then there's also a more general definition of calculus, which is the first definition in Merriam-Webster: "a method of computation or calculation in a special notation (as of logic or symbolic logic)." One example of this is the lambda calculus. Differential and integral calculus are just special cases of this general definition.
I still don’t understand the process of learning ML, like sure we build micrograd but is it only didactic exercise or can we use it to train it to do something serious on our own hardware?
I don’t understand this comment, for one we’re engineers/hackers and should be curious how this stuff works. It’s exciting. Practically speaking this is like asking why learn how to write a simple forum or blog when we can’t host Facebook on our on hardware: it’s going to be hard to work on the latest models if you don’t first understand the basics.
You can totally do some visual classification problems (like object detection) on current consumer hardware. Even more. You can also take some smaller existing language models and fine tune them for some special task - also completely feasible.
I guess it depends on what you mean by serious. Pre-training a competitive LLM with current methods and consumer hardware is prohibitive for sure. Solving a classification problem could be totally doable depending on the domain.
You'll probably need some hardware acceleration. There's a good course that builds something like micrograd in the beginning and extends on it: https://dlsyscourse.org/lectures/
8 GB of memory is a little restrictive but still usable for many problems like visual classification problems. Training would take a little more time though as you can't do as much batch processing as it would be possible with more memory.
It should be worth to do these kind of hardware accelerations with any kind of hardware that has more TFlop/s (and memory bandwidth) than your CPU basically.
It is rather accessible.