Lemma
math, backwards

softmax

ko · counterpart 소프트맥스

The function that turns a vector of logits `z = (z₁, …, zₙ)` into a vector of positive numbers summing to 1: `softmax(z)ᵢ = exp(zᵢ) / Σⱼ exp(zⱼ)`. Two facts to keep close. (1) It depends only on differences `zᵢ − zⱼ` — adding the same constant to every logit changes nothing. (2) It never outputs exactly 0 or 1, only their limits. The output _looks like_ a probability distribution; it does not guarantee that the assigned probability matches any real-world frequency.

invented

1868 (Boltzmann) / 1989 (Bridle, ML) · Ludwig Boltzmann (physics) → John S. Bridle (ML) · Vienna → RSRE Malvern, UK

Boltzmann (1868) wrote `e^(-E/kT) / Z` to describe the probability of a physical state at temperature T — statistical mechanics. 121 years later, Bridle (1989) repurposed exactly the same formula for neural-network output layers, calling it 'softmax' because temperature → 0 makes it a hard argmax. Same equation, two completely separate problems.

en.wikipedia.org/wiki/Softmax_function ↗

used on · 3