the hook · the everyday move

Most equations are hard. Their tangent line is easy.

Every time you say “for small $θ$ , $\sin θ ≈ θ$ ” or “near here, things grow linearly” or “to a first approximation,” you are linearizing . The pattern is so common we stop noticing it. The pendulum clock works because of it. Newton’s method is built on it. Gradient descent is built on it. Most engineering is the discipline of staying in the regime where it holds.

In the widget below, drag a (the anchor) and x (where to evaluate). With $a = 0$ , the tangent line of $\sin$ is exactly $y = x$ — the small-angle approximation, in pictures. Drag x away from $0$ and watch the error grow as the square of the distance.

Widget — Tangent approximation

f(x) — true0.7174

approx — tangent0.8000

error-0.0826

error / (x − a)²-0.129

anchor a0.00evaluate at x0.80

With the anchor at a = 0, the tangent line is y = x — the famous sin x ≈ x. Drag x: at x = 0.1 the error is below 0.001; at x = 1 it's near −0.16; at x = π/2 the tangent says π/2 ≈ 1.57 while the true value is 1. The "error / (x − a)²" column is roughly constant near the anchor — that's the error grows as the square of deviation rule, made directly visible.

the arc

Why we approximate

The functions that govern the world are mostly nonlinear: a pendulum’s $\sin θ$ , a transistor’s exponential current, a gravitational force’s $1/r²$ , a neural network’s softmax. Solving them in closed form is, in most cases, impossible. So we trade away exactness in a controlled way: pick a point we care about, replace the nonlinear function with the closest linear function in a neighbourhood of that point, and accept that the answer is correct only “near enough.”

The tangent line at a point

At any smooth point $a$ , the function $f$ has both a value $f(a)$ and a slope $f'(a)$ (from the derivatives module). The unique line passing through $(a, f(a))$ with that slope is the tangent line :

L_a(x)  =  f(a)  +  f'(a) · (x − a)

That is the linearization of $f$ at $a$ . It matches the function in two ways: $L_a(a) = f(a)$ (same value at the anchor) and $L_a'(a) = f'(a)$ (same slope at the anchor). No other line can claim both. The widget draws this line with the dashed brown stroke; for $f(x) = \sin x$ at $a = 0$ , the line is $L_0(x) = 0 + 1 · (x − 0) = x$ .

import math

# Linearization of f at a:  L_a(x) = f(a) + f'(a) · (x − a).
# Choose anchor a, compare with the true value over a range of x.
def linearize(f, fprime, a):
    fa, slope = f(a), fprime(a)
    return lambda x: fa + slope * (x - a)

L0 = linearize(math.sin, math.cos, a=0)    # tangent at 0 is y = x
[(round(x, 2), round(math.sin(x), 4), round(L0(x), 4))
 for x in (0.05, 0.2, 0.5, 1.0)]
# → [(0.05, 0.0500, 0.0500),    # < 0.0001 error
#    (0.2,  0.1987, 0.2000),    # 0.001
#    (0.5,  0.4794, 0.5000),    # 0.02
#    (1.0,  0.8415, 1.0000)]    # 0.16  — visibly bad

Error grows as a square

For any smooth function, the error $f(x) − L_a(x)$ behaves like a quadratic in the deviation $x − a$ . Doubling the deviation roughly quadruples the error; halving it cuts the error to a quarter. This is _quadratic, not linear* — and it is the reason linearization is useful: the gap closes very quickly as you approach the anchor.

Concretely for $\sin$ at $0$ : the leading error term is $−x³/6$ , so $error / (x − a)²$ drifts slowly with $x$ rather than staying constant — the cubic term dominates here. Other functions ( $e^x$ , $\sqrt{1 + x}$ , $1/(1 − x)$ ) have a constant $error / (x − a)²$ ratio because their second derivative at the anchor is nonzero. Either way, the rule of thumb is the same: “small” deviations make linear approximation fine; “large” deviations make it wrong, fast.

# Error scales as (x − a)², not as (x − a). Quadratic, not linear.
# Doubling the deviation quadruples the error.
def error_ratio(f, L, a, x):
    return (f(x) - L(x)) / (x - a) ** 2 if x != a else None

[error_ratio(math.sin, L0, 0, x) for x in (0.05, 0.1, 0.2, 0.4, 0.8)]
# → roughly all near −0.166  (≈ −1/6)
# The leading Taylor remainder for sin near 0 is −x³/6, so dividing by
# (x − a)² gives roughly −x/6, drifting slowly with x. The shape "error
# = constant·deviation²" is the dominant term in every linearization;
# all you have to read off is the constant.

Where this shows up — one tool, three pillars

Linearization is the first honest lie: replace a curved thing by the line that tells the truth nearby. The lie shows up under different names in different pillars; the math is the same.

physics : sin θ ≈ θ near zero
ml      : calibration curve ≈ tangent near one bin
finance : ΔPV ≈ -D · PV · Δr near the current rate (bond duration)

The pendulum clock runs on a single linearization: $\sin θ ≈ θ$ for small angles. The nonlinear ODE $\ddot{θ} = −(g/L) \sin θ$ becomes $\ddot{θ} = −(g/L) θ$ — a linear oscillator with a closed-form sinusoidal solution and a constant-period swing. The whole 17th-century clock technology lives inside the small-angle regime where the lie holds, and the page’s widget makes that regime visible.

The damped oscillator extends the same lie one step further: $\ddot{x} + 2γ\dot{x} + ω₀² x = F(t)$ is the small-amplitude linearization of every physical system that oscillates with friction and forcing. Car suspensions, building sway, RLC circuits, vocal cords driving glass — the same equation runs all of them inside the regime the linearization holds.

Model calibration does the same trick on a curve a model produces. Click any bin in the reliability diagram and the widget draws the tangent to the calibration curve at that bin’s center: locally, $actual(p) ≈ m·p + c$ . Two numbers — slope and intercept — describe the gap between confidence and frequency near that bin. Slope ≈ 1 means the curve is parallel to the diagonal there (a constant shift); slope ≠ 1 means the gap changes with confidence, which a global rotation (temperature scaling) can fix. Same tangent-line tool, completely different use.

Present value — the bond market’s working approximation. PV is a nonlinear function of the interest rate $r$ (an integral of $e^{-rt}$ ), but for small rate moves it linearizes to $\Delta PV \approx -D · PV · \Delta r$ , where $D$ is the modified duration. Traders quote duration instead of recomputing the integral after every rate tick. The approximation is honest inside a small Δr neighbourhood; outside it, the same page’s convexity correction (the second-order term) catches what duration misses. Linearization first, second-order correction second — the standard pattern.

The same machine also drives Newton’s method (linearize, find the line’s zero, repeat) and gradient descent (the first-order Taylor approximation says $L(w − η·∇L) ≈ L(w) − η·\|∇L\|²$ ; if $η$ is small enough that the linearization is trustworthy, the loss decreases — past the ceiling and the linearization lies, which is exactly the explosion at $η = 0.27$ in that page’s widget). Newton, gradient descent, the pendulum, calibration, and bond duration all run the same step: replace the curve with its tangent for as long as the tangent is honest.

# Newton's method: solve f(x) = 0 by repeatedly linearizing at the
# current guess, then finding where THAT line crosses zero.
def newton(f, fprime, x0, steps=5):
    x = x0
    for _ in range(steps):
        x = x - f(x) / fprime(x)         # the root of the tangent line
    return x

# Fixed point of cos:  solve cos(x) − x = 0 starting near 0.5.
newton(lambda x: math.cos(x) - x,
       lambda x: -math.sin(x) - 1,
       x0=0.5)
# → 0.7390851332151607     (the Dottie number)
#
# Each Newton step IS a linearization step. Gradient descent is the
# same recipe applied to ∇L instead of f, with a fixed-size step (η)
# instead of the exact root of the tangent.

The catch — only 'almost'

Every linearization is wrong outside its anchor’s neighbourhood. The discipline of using the tool well is the discipline of measuring and respecting that neighbourhood:

Quote a bound on the deviation for which the linear answer is good enough — not just a slogan that “linearization works.”
Compute or estimate the next-order term and check that it is small (or that its sign won’t bite you).
Stay inside the regime by design: clock escapements force small arcs; ML training schedules shrink the learning rate; control systems stay near operating points; circuit designers bias transistors into the linear region. When you can’t stay there, switch to a higher-order method or a nonlinear solver — and accept the cost.

That phrase — “the regime where the lie holds” — is the same one the pendulum page closes on. It is not coincidence; it is the pattern this module names. Every applied-math discipline has a private inventory of such regimes, kept by people who know exactly how far they can lean.

The tangent line is the cheapest answer that still gets the slope right. Linearization replaces a hard problem with an easy one — valid in a regime, wrong outside it, always. The discipline is the regime.

exercises · 손으로 풀기

1small-angle by handno calculator

Linearize $f(x) = \sin x$ at $a = 0$ . Use the linearization to estimate $\sin 0.1$ . The true value (3 decimals) is $0.0998$ . What is the error? About what fraction of $x$ is it?

2exponential at zerono calculator

Linearize $f(x) = e^x$ at $a = 0$ . Estimate $e^0.2$ . The true value is about $1.2214$ . Compare with $e^0.5$ (true: $1.6487$ ). What does the relative error pattern look like as the deviation grows?

3Newton step as linearization

Suppose you want to solve $f(x) = 0$ for some nonlinear $f$ . You have a guess $x_n$ . Linearize $f$ at $x_n$ and find where that line crosses zero. Show that the next guess is $x_(n+1) = x_n − f(x_n) / f'(x_n)$ . What breaks this iteration?

4why gradient descent's η has a ceiling

The first-order Taylor approximation of the loss $L(w + d) ≈ L(w) + ∇L(w) · d$ is honest only for small $\|d\|$ . The gradient descent step chooses $d = −η · ∇L$ . Use the second-order term to argue why the toy quadratic loss in /ml/gradient-descent has a hard $η$ ceiling at $2/L''(w_)$ .

glossary · used on this page · 2

linearization·선형화

Replacing a nonlinear function near a chosen point by its tangent line — keeping only the constant and first-derivative terms of the Taylor expansion. Near `x = 0`, `sin x ≈ x`, `cos x ≈ 1 − x²/2`, `e^x ≈ 1 + x`. The approximation is excellent for small `|x|` and grows wrong as `|x|` increases. Mechanical clocks, electrical circuit analysis, control systems, and most of "the engineering equations" are linearized versions of much harder nonlinear ones, valid in the small-deviation regime where everything in the system is supposed to live.

tangent line·접선

A straight line that touches a curve at a single point and matches the curve's direction there. Its slope at the point of contact equals the derivative of the function at that point: `m_tangent = f'(a)`. The tangent is what the secant becomes in the limit as its two intersection points merge — the curve's _instantaneous direction_ made visible as a line. Distinct from the trigonometric tangent; same word, different concept.