journey · 7 days · ml / dl

To Backprop in 7 days

Confidently wrong, then less wrong, then almost right. Why does a model train? It walks downhill. The hill is a function. The walking is calculus.

Backpropagation is not a single trick. It is four modules — log, derivatives, linearization, vectors — quietly cooperating under one application. This path lets each one earn its place before the application uses it.

the path · 0/7 · 0%

1
application·day 1·→ next
/ml/confident-wrong
Read once for the trap: a model can be confidently wrong, and softmax doesn't know.
open →
2
module·day 2
/modules/log
Cross-entropy lives in log space. Read why the product of probabilities is replaced by a sum of their logs — float underflow is not a bug to patch around.
open →
3
module·day 3
/modules/derivatives
The slope of the loss is the direction of training. The same machine — secant collapsing onto tangent — under a new name (the gradient).
open →
4
module·day 4
/modules/linearization
Far from a critical point, the loss looks linear. That linear approximation is exactly what every optimizer secretly trusts.
open →
5
module·day 5
/modules/vectors
The gradient is a vector. Backprop is vector calculus done with care. Read this *before* the application — the abstraction earns the page first.
open →
6
application·day 6
/ml/gradient-descent
All the tools are now on the bench. Walk downhill on a real loss surface. Watch the step size matter as much as the direction.
open →
7
review·day 7
/ml/confident-wrong
Re-read with all of the above in hand. The arc-5 trap should now feel like an obvious consequence, not a surprise.
open →