Lemma
math, backwards
the hook · ×→+

Engineers carried wooden rulers until 1972.

They used them to multiply by adding — sliding two log-spaced scales past each other, reading the product off the meeting point. The trick was 400 years old by then and is still the trick: log(a·b) = log(a) + log(b). The same one-line identity is why a modern language model can train at all. Multiply twenty probabilities of 0.1 in and the result rounds silently to zero — every that touched it dies with it. never multiplies them. It calls and adds twenty negative numbers. The model trains because somebody decided to live in log-space.

The whole module is one equation. Everything else is consequence.

Widget A — Two Stacks
log₁₀(a)0.30
log₁₀(b)0.48
log₁₀(a) + log₁₀(b)0.78
a · b6.00
linear02004006008001000log₁₀1101001000a = 2.00b = 3.00a·b = 6.00a·b
Drag a = 2 and b = 3. The marker lands on 6 — but you never multiplied. You added two log-distances. Drag b to 5. The marker jumps to 10. Same trick.
the arc
1

The identity that does all the work

Log is defined by one rule: log(a·b) = log(a) + log(b). Pick any base. The rule is the same. Every other property falls out of that line. log(a/b) = log(a) − log(b): take the rule, replace b with 1/b, done. log(aⁿ) = n·log(a): apply the rule n times to a · a · … · a. log(1) = 0: from log(1·a) = log(1) + log(a). There is no fourth rule because there is no fourth way to combine multiplications. Practically: the of a number tells you how many factors of the base it is built from. log₁₀(1000) = 3 because 1000 is three tens, multiplied. Counting factors. That's it.

2

Napier and the slide rule (×→+ embodied)

John Napier published the first log tables in 1614 because astronomers were dying inside, multiplying nine-digit numbers by hand to predict eclipses. His tables let them look up log(a) and log(b), add the two, and look up what number had that log — the answer to a·b with no multiplication anywhere. Three centuries later, every engineer carried a : a wooden ruler with two log-spaced scales that slid past each other. Aligning 2 on one against 3 on the other physically performed log(2) + log(3) and showed 6 at the meeting point. The slide rule is the identity from § 1, made into furniture. Apollo got to the moon on these.

3

Underflow — and why log-space saves your model

A can hold numbers down to about 10⁻³⁸. Multiply forty probabilities of 0.1 and you've crossed it — the result rounds to zero, silently. No exception. No warning. Every that depended on it dies with it. This isn't a numerical-analysis curiosity; it's why every deep-learning library reports loss as a sum, not a product. The fix is the identity from § 1, applied mechanically: take logs the moment a product would otherwise form. log(p₁·p₂·…·pₙ) = Σ log(pᵢ). Each log(pᵢ) is a comfortable negative number; their sum is a comfortable larger negative number. No can reach you. This is what is doing, and what was built to do. Live in log-space; sums replace products; floats stop lying.

4

Back to the application

This module is consumed by the Bitcoin Pizza. There you're asked to compute 10⁹ · 2.89²⁰ by hand. You can't, until you take log₁₀ of both sides — and then you're adding 9 + 20·log₁₀(2.89), two numbers a human can manage. That hand-computation is only possible because of the identity in § 1. Same trick as Napier's, same trick as . Different decade, different stakes, identical mechanism.

log(a·b) = log(a) + log(b). The whole module. Everything else — the digit-count rule, the , , hand-computing $10⁹ × 2.89²⁰ — is a corollary.
exercises · 손으로 풀기
1read the graph
On the Two Stacks widget, set a = 4. What value of b makes a·b land exactly on 100? Read it off the log axis without computing.
2compute by hand · the digit ruleno calculator
Without a calculator, give log₁₀(2,000,000) using only log₁₀(2) ≈ 0.301.
3write the equation · sequence probability
You evaluate a 50-token sequence; each token has probability ~0.05. Write the formula your code should compute, and the formula it should avoid. Use log(0.05) ≈ −3.00.
4compute by hand · Stirling on a napkinno calculator
: log₁₀(n!) ≈ n·log₁₀(n) − n·log₁₀(e), with log₁₀(e) ≈ 0.434. Estimate log₁₀(100!). How many digits does 100! have?
5read the graph · equal log-distance = equal ratio
On Two Stacks, drag a and b so that the gap log(b) − log(a) is exactly the gap from log(1) to log(10). What does b/a always equal, regardless of where you placed them?
6write the equation · logsumexp
You're given two probabilities p and q, but you only know log p and log q (not p, q themselves — they'd underflow). Derive a numerically stable expression for log(p + q). (This is the trick.)
7the evil one · 'just multiply'
A junior says: " is just a perf optimization. Mathematically you could just multiply the probabilities — switch to float64 if you're worried." Write a one-paragraph rebuttal that holds for both and float64. Then state the single equation that makes log-space work.
glossary · used on this page · 0
module: The Logarithm. Consumed by Bitcoin Pizza. Future modules build on this one: log-likelihood + cross-entropy, decibels, half-life, information / entropy.
CC BY 4.0 (content) · MIT (code)