softmax
The function that turns a vector of logits `z = (z₁, …, zₙ)` into a vector of positive numbers summing to 1: `softmax(z)ᵢ = exp(zᵢ) / Σⱼ exp(zⱼ)`. Two facts to keep close. (1) It depends only on differences `zᵢ − zⱼ` — adding the same constant to every logit changes nothing. (2) It never outputs exactly 0 or 1, only their limits. The output _looks like_ a probability distribution; it does not guarantee that the assigned probability matches any real-world frequency.
1868 (Boltzmann) / 1989 (Bridle, ML) · Ludwig Boltzmann (physics) → John S. Bridle (ML) · Vienna → RSRE Malvern, UK
Boltzmann (1868) wrote `e^(-E/kT) / Z` to describe the probability of a physical state at temperature T — statistical mechanics. 121 years later, Bridle (1989) repurposed exactly the same formula for neural-network output layers, calling it 'softmax' because temperature → 0 makes it a hard argmax. Same equation, two completely separate problems.