Most equations are hard. Their tangent line is easy.
Every time you say “for small , ” or “near here, things grow linearly” or “to a first approximation,” you are
In the widget below, drag a (the anchor) and x (where to evaluate). With , the tangent line of is exactly — the small-angle approximation, in pictures. Drag x away from and watch the error grow as the square of the distance.
Why we approximate
The functions that govern the world are mostly nonlinear: a pendulum’s , a transistor’s exponential current, a gravitational force’s , a neural network’s softmax. Solving them in closed form is, in most cases, impossible. So we trade away exactness in a controlled way: pick a point we care about, replace the nonlinear function with the closest linear function in a neighbourhood of that point, and accept that the answer is correct only “near enough.”
The tangent line at a point
At any smooth point , the function has both a value and a slope (from the derivatives module). The unique line passing through with that slope is the
L_a(x) = f(a) + f'(a) · (x − a)
That is the linearization of at . It matches the function in two ways: (same value at the anchor) and (same slope at the anchor). No other line can claim both. The widget draws this line with the dashed brown stroke; for at , the line is .
import math
# Linearization of f at a: L_a(x) = f(a) + f'(a) · (x − a).
# Choose anchor a, compare with the true value over a range of x.
def linearize(f, fprime, a):
fa, slope = f(a), fprime(a)
return lambda x: fa + slope * (x - a)
L0 = linearize(math.sin, math.cos, a=0) # tangent at 0 is y = x
[(round(x, 2), round(math.sin(x), 4), round(L0(x), 4))
for x in (0.05, 0.2, 0.5, 1.0)]
# → [(0.05, 0.0500, 0.0500), # < 0.0001 error
# (0.2, 0.1987, 0.2000), # 0.001
# (0.5, 0.4794, 0.5000), # 0.02
# (1.0, 0.8415, 1.0000)] # 0.16 — visibly badError grows as a square
For any smooth function, the error behaves like a quadratic in the deviation . Doubling the deviation roughly quadruples the error; halving it cuts the error to a quarter. This is _quadratic, not linear* — and it is the reason linearization is useful: the gap closes very quickly as you approach the anchor.
Concretely for at : the leading error term is , so drifts slowly with rather than staying constant — the cubic term dominates here. Other functions (, , ) have a constant ratio because their second derivative at the anchor is nonzero. Either way, the rule of thumb is the same: “small” deviations make linear approximation fine; “large” deviations make it wrong, fast.
# Error scales as (x − a)², not as (x − a). Quadratic, not linear.
# Doubling the deviation quadruples the error.
def error_ratio(f, L, a, x):
return (f(x) - L(x)) / (x - a) ** 2 if x != a else None
[error_ratio(math.sin, L0, 0, x) for x in (0.05, 0.1, 0.2, 0.4, 0.8)]
# → roughly all near −0.166 (≈ −1/6)
# The leading Taylor remainder for sin near 0 is −x³/6, so dividing by
# (x − a)² gives roughly −x/6, drifting slowly with x. The shape "error
# = constant·deviation²" is the dominant term in every linearization;
# all you have to read off is the constant.Where this shows up — one tool, three pillars
Linearization is the first honest lie: replace a curved thing by the line that tells the truth nearby. The lie shows up under different names in different pillars; the math is the same.
physics : sin θ ≈ θ near zero ml : calibration curve ≈ tangent near one bin finance : ΔPV ≈ -D · PV · Δr near the current rate (bond duration)
The pendulum clock runs on a single linearization: for small angles. The nonlinear ODE becomes — a linear oscillator with a closed-form sinusoidal solution and a constant-period swing. The whole 17th-century clock technology lives inside the small-angle regime where the lie holds, and the page’s widget makes that regime visible.
The damped oscillator extends the same lie one step further: is the small-amplitude linearization of every physical system that oscillates with friction and forcing. Car suspensions, building sway, RLC circuits, vocal cords driving glass — the same equation runs all of them inside the regime the linearization holds.
Model calibration does the same trick on a curve a model produces. Click any bin in the reliability diagram and the widget draws the tangent to the calibration curve at that bin’s center: locally, . Two numbers — slope and intercept — describe the gap between confidence and frequency near that bin. Slope ≈ 1 means the curve is parallel to the diagonal there (a constant shift); slope ≠ 1 means the gap changes with confidence, which a global rotation (temperature scaling) can fix. Same tangent-line tool, completely different use.
Present value — the bond market’s working approximation. PV is a nonlinear function of the interest rate (an integral of ), but for small rate moves it linearizes to , where is the modified duration. Traders quote duration instead of recomputing the integral after every rate tick. The approximation is honest inside a small Δr neighbourhood; outside it, the same page’s convexity correction (the second-order term) catches what duration misses. Linearization first, second-order correction second — the standard pattern.
The same machine also drives Newton’s method (linearize, find the line’s zero, repeat) and gradient descent (the first-order Taylor approximation says ; if is small enough that the linearization is trustworthy, the loss decreases — past the ceiling and the linearization lies, which is exactly the explosion at in that page’s widget). Newton, gradient descent, the pendulum, calibration, and bond duration all run the same step: replace the curve with its tangent for as long as the tangent is honest.
# Newton's method: solve f(x) = 0 by repeatedly linearizing at the
# current guess, then finding where THAT line crosses zero.
def newton(f, fprime, x0, steps=5):
x = x0
for _ in range(steps):
x = x - f(x) / fprime(x) # the root of the tangent line
return x
# Fixed point of cos: solve cos(x) − x = 0 starting near 0.5.
newton(lambda x: math.cos(x) - x,
lambda x: -math.sin(x) - 1,
x0=0.5)
# → 0.7390851332151607 (the Dottie number)
#
# Each Newton step IS a linearization step. Gradient descent is the
# same recipe applied to ∇L instead of f, with a fixed-size step (η)
# instead of the exact root of the tangent.The catch — only 'almost'
Every linearization is wrong outside its anchor’s neighbourhood. The discipline of using the tool well is the discipline of measuring and respecting that neighbourhood:
- Quote a bound on the deviation for which the linear answer is good enough — not just a slogan that “linearization works.”
- Compute or estimate the next-order term and check that it is small (or that its sign won’t bite you).
- Stay inside the regime by design: clock escapements force small arcs; ML training schedules shrink the learning rate; control systems stay near operating points; circuit designers bias transistors into the linear region. When you can’t stay there, switch to a higher-order method or a nonlinear solver — and accept the cost.
That phrase — “the regime where the lie holds” — is the same one the pendulum page closes on. It is not coincidence; it is the pattern this module names. Every applied-math discipline has a private inventory of such regimes, kept by people who know exactly how far they can lean.
The tangent line is the cheapest answer that still gets the slope right. Linearization replaces a hard problem with an easy one — valid in a regime, wrong outside it, always. The discipline is the regime.
Linearize at . Use the linearization to estimate . The true value (3 decimals) is . What is the error? About what fraction of is it?
Linearize at . Estimate . The true value is about . Compare with (true: ). What does the relative error pattern look like as the deviation grows?
Suppose you want to solve for some nonlinear . You have a guess . Linearize at and find where that line crosses zero. Show that the next guess is . What breaks this iteration?
The first-order Taylor approximation of the loss is honest only for small . The gradient descent step chooses . Use the second-order term to argue why the toy quadratic loss in /ml/gradient-descent has a hard ceiling at .