Lemma
math, backwards
journey · 7 days · ml / dl

To Backprop in 7 days

Confidently wrong, then less wrong, then almost right. Why does a model train? It walks downhill. The hill is a function. The walking is calculus.

Backpropagation is not a single trick. It is four modules — log, derivatives, linearization, vectors — quietly cooperating under one application. This path lets each one earn its place before the application uses it.

the path · 0/7 · 0%
  1. 1
    application·day 1·next
    /ml/confident-wrong
    Read once for the trap: a model can be confidently wrong, and softmax doesn't know.
    open →
  2. 2
    module·day 2
    /modules/log
    Cross-entropy lives in log space. Read why the product of probabilities is replaced by a sum of their logs — float underflow is not a bug to patch around.
    open →
  3. 3
    module·day 3
    /modules/derivatives
    The slope of the loss is the direction of training. The same machine — secant collapsing onto tangent — under a new name (the gradient).
    open →
  4. 4
    module·day 4
    /modules/linearization
    Far from a critical point, the loss looks linear. That linear approximation is exactly what every optimizer secretly trusts.
    open →
  5. 5
    module·day 5
    /modules/vectors
    The gradient is a vector. Backprop is vector calculus done with care. Read this *before* the application — the abstraction earns the page first.
    open →
  6. 6
    application·day 6
    /ml/gradient-descent
    All the tools are now on the bench. Walk downhill on a real loss surface. Watch the step size matter as much as the direction.
    open →
  7. 7
    review·day 7
    /ml/confident-wrong
    Re-read with all of the above in hand. The arc-5 trap should now feel like an obvious consequence, not a surprise.
    open →