Lemma
math, backwards
journey · 7 days · ml / dl
To Backprop in 7 days
Confidently wrong, then less wrong, then almost right. Why does a model train? It walks downhill. The hill is a function. The walking is calculus.
Backpropagation is not a single trick. It is four modules — log, derivatives, linearization, vectors — quietly cooperating under one application. This path lets each one earn its place before the application uses it.
the path · 0/7 · 0%
- 1application·day 1·→ next/ml/confident-wrongRead once for the trap: a model can be confidently wrong, and softmax doesn't know.open →
- 2module·day 2/modules/logCross-entropy lives in log space. Read why the product of probabilities is replaced by a sum of their logs — float underflow is not a bug to patch around.open →
- 3module·day 3/modules/derivativesThe slope of the loss is the direction of training. The same machine — secant collapsing onto tangent — under a new name (the gradient).open →
- 4module·day 4/modules/linearizationFar from a critical point, the loss looks linear. That linear approximation is exactly what every optimizer secretly trusts.open →
- 5module·day 5/modules/vectorsThe gradient is a vector. Backprop is vector calculus done with care. Read this *before* the application — the abstraction earns the page first.open →
- 6application·day 6/ml/gradient-descentAll the tools are now on the bench. Walk downhill on a real loss surface. Watch the step size matter as much as the direction.open →
- 7review·day 7/ml/confident-wrongRe-read with all of the above in hand. The arc-5 trap should now feel like an obvious consequence, not a surprise.open →