Why is the model so confident about a wrong answer?
A model does not know it is right. It has scores.
Softmax does not check truth. It only compares scores. Softmax returns a
Scores aren't probabilities
The last layer of a classifier outputs three numbers — one per class. They are not constrained to be positive. They are not constrained to sum to anything. They are just
We need a function that maps three real numbers to three positive numbers summing to 1. Many such functions exist. The one we use is softmax — and the reason it’s that one, not another, is the point of the next step.
Softmax — exponentiate, then normalize
Softmax is two steps. First, raise everything to : . Now they’re all positive. Second, divide by the sum so they sum to 1: . Done. Three numbers, all positive, sum to 1 — looks exactly like a probability distribution.
The exp step is not arbitrary. It guarantees positivity (any real exponent of is positive) and it makes softmax depend only on differences of logits — adding the same constant to every logit changes nothing. The downstream effect is profound: softmax doesn’t know the absolute scale of your scores. It only sees who’s ahead, by how much.
import numpy as np
# Three logits — raw scores. No truth check anywhere.
z = np.array([2.0, 1.0, 0.5])
# Numerically stable softmax: subtract max before exp.
def softmax(z, T=1.0):
s = z / T
s = s - s.max()
e = np.exp(s)
return e / e.sum()
p = softmax(z)
# p ≈ [0.659, 0.242, 0.099] (sums to 1)
# Same logits + 100 give the same p — softmax depends only on differences.
softmax(z + 100)
# → [0.659, 0.242, 0.099]Cross-entropy — punish the probability you gave the correct answer
Now there’s a probability vector. Suppose the truth is class 0. The number we care about is — what the model gave to the right answer. The training signal we want should be 0 when (perfect) and large when is small (confidently wrong). The simplest function that does this: . That’s
Why
# In PyTorch, log_softmax + nll_loss is the numerically stable cross-entropy.
import torch
import torch.nn.functional as F
z = torch.tensor([[2.0, 1.0, 0.5]]) # logits, batch of 1
y = torch.tensor([0]) # true class — 'cat' at index 0
log_p = F.log_softmax(z, dim=1) # avoids log(softmax) overflow
loss = F.nll_loss(log_p, y)
# loss ≈ 0.417 = −log p_true = −log(0.659)
#
# Why log_softmax not log(softmax)?
# log(softmax) computes exp() first → overflow when logits are large.
# log_softmax keeps things in log-space the whole way through.Temperature — the confidence dial
Replace with .
Confidence ≠ truth — the trap, made explicit
Set logits to — the model has decided hard for class 0. Softmax says : 97.8% confidence. But truth is independent of this calculation. If the real label is class 1, then and the loss is . The model is confident _and* wrong. Lowering temperature makes it more confident, and the loss rises faster.
This is why “the model said 97% so it must be right” is a category error. Looks like a probability and matches reality are two unrelated claims. Aligning them is its own field — calibration — and it requires data the model never sees during training.
# The trap: a wrong score can produce a confident probability.
import numpy as np
z = np.array([5.0, 1.0, 0.5]) # model is sure of class 0
true_idx = 1 # but truth is class 1
p = softmax(z)
# p ≈ [0.978, 0.018, 0.011]
# 97.8% confidence — and wrong.
loss = -np.log(p[true_idx])
# loss ≈ 4.0 (huge — log explodes as p_true → 0)
#
# Lower the temperature, and confidence rises further while truth is unchanged.
softmax(z, T=0.5)[0] # ≈ 0.99964 (even more sure)Cross-entropy can disagree with accuracy in both directions. Try logits with true class 0: argmax is correct, but , loss — a “right” answer with terrible loss. Now flip to with true class 1: argmax is wrong, but loss . Selecting the model with the lowest training loss is not the same as selecting the model that gets the most answers right. The metric you optimize is not the metric you care about; the gap is where shipped models hide.
Softmax doesn’t check truth. It compares scores, exponentiates, normalizes. Cross-entropy reads off the bar that happens to belong to the correct answer. Confidence and rightness are two different things; the model only computes the first.
In the widget, set logits to and lower the temperature from 1.0 to 0.1. Which class wins at each temperature? How does the winning probability change? Now raise T to 5.0 — what happens to the bars? Why does the winner never change just because of T?
Estimate at T = 1. Use , . Round and check the bars match.
The loss is . Compute it for (natural log). What does the gap between and say about how cross-entropy treats _confident wrongness*?
A model gives the wrong class a probability of , leaving for the truth. What is the loss? Now imagine someone asks: “but it was 99% sure — isn’t that close to right?” Write a one-sentence reply that distinguishes looks like a probability from is actually likely.
ML courses usually present softmax and cross-entropy back-to-back as the “classification recipe” — exponentiate, normalize, take the log of the right one, done. Lemma keeps them apart. Softmax compresses scores into something that looks like probability; cross-entropy punishes the probability the model assigned to the right answer. They are two unrelated jobs glued together by custom. Treating them as one obscures the trap shown in arc 5: a model can be confidently wrong, and the recipe gives no signal that it is.