Lemma
math, backwards
the hook · the arrow the trail wants to be

What is a curve doing right now?

A moving point leaves a trail. The derivative is not the trail. It is the arrow the trail wants to become at this instant. Average speed knows two points and divides; instantaneous speed knows only one point, but somehow has a number anyway. That number — what every rate in physics, ML, and engineering ultimately is — comes from forcing the secant line through two points to collapse onto the tangent line at one.

In the widget below: the orange secant through two points on y=x2y = x². Drag h toward 0 and the secant rotates onto the brown tangent. The slope it converges to is the derivative at that point — the curve’s instantaneous direction, made into a number.

tool spec
what

The instantaneous rate of change. f(a)=limh0(f(a+h)f(a))/hf'(a) = \lim h→0 (f(a+h) − f(a)) / h. The arrow the trail wants to be at this instant — secant slopes squeezed onto a single tangent slope as the second point collapses into the first.

applies when

Wherever a rate shows up. Velocity (position → speed), gradient descent (loss → step direction), small-angle approximation (sin θ ≈ θ near 0), local linearization in physics, ML, control. One machine, four names: slope, velocity, rate, gradient.

breaks when

The limit must exist. Sharp corners (|x| at 0), jumps (step functions), and pathological wiggles (Weierstrass nowhere-differentiable function) have no derivative. In ML, hidden non-differentiabilities — ReLU at 0, max-pooling, indicator functions — are patched by convention, not by mathematics; the gradient you compute through them is a choice, not a fact.

Widget — Secant → tangent
secant slope(f(a+h) − f(a))/h = 3.0000
tangent slopef'(1.00) = 2·1.00 = 2.0000
difference (= h)1.0000
Drag the orange Q toward the green P — or shrink h on the slider. The orange secant rotates and collapses onto the brown dashed tangent; the secant slope 2a + h approaches the tangent slope 2a; the difference shown above shrinks to exactly h. That number — what the secant slope *converges to* as the two points merge — is the derivative f'(a). Drag P and the slope changes at a rate of 2 per unit: that is d/da [2a] = 2 — one more derivative.
the arc
1

Δ over Δ — average rate

You drove from x=1kmx = 1 km at t=0st = 0 s to x=9kmx = 9 km at t=4st = 4 s. Your average speed is (91)/(40)=2km/s(9 − 1) / (4 − 0) = 2 km/s. That number describes the trip as a whole — but it does not tell you what you were doing at t=2st = 2 s. The line through (0,1)(0, 1) and (4,9)(4, 9) on the position graph is a , and its slope is the average. You can compute it without any limit.

# Average rate of change — the secant slope. Two points, one division.
def average_rate(f, a, b):
    return (f(b) - f(a)) / (b - a)

f = lambda t: t * t            # x(t) = t² — toy "position"
average_rate(f, 1, 3)          # → 4.0   (position went 1 → 9 in 2 seconds)
2

Shrink the interval — instantaneous rate

“What was I doing at t=2st = 2 s?” requires the secant’s two points to merge into one. Pick a fixed anchor a=2a = 2 and a small interval hh; compute (f(a+h)f(a))/h(f(a+h) − f(a)) / h as hh shrinks. For f(t)=t2f(t) = t²:

(f(2+h) − f(2)) / h
 = ((2+h)² − 4) / h
 = (4 + 4h + h² − 4) / h
 = 4 + h            ← independent of how big h is, except for the +h tail

As h0h → 0, the expression converges to 44. Not “approaches” in some hand-wavy sense — the value is 4+h4 + h, and hh can be made as small as you want. The number that survives in the limit, 4, is the of t2 at t=2t = 2: f(2)=4f'(2) = 4. Geometrically, the slope of the through (2,4)(2, 4).

# Instantaneous rate — shrink the interval and watch the secant slope
# converge to the tangent slope. No epsilon-delta; just shrink and look.
def secant_slope(f, a, h):
    return (f(a + h) - f(a)) / h

[secant_slope(f, 3, h) for h in (1, 0.1, 0.01, 0.0001)]
# → [7.0, 6.1, 6.01, 6.0001]
# The pattern: 6 + h. The limit as h → 0 is 6.
# That's f'(3) for f(t) = t².  In general, f'(t) = 2t.
3

One machine, three names — slope, velocity, rate

The recipe — pick two points, compute the secant slope, shrink the interval, take the limit — gives the same kind of number whatever you plug in. If ff is a position-vs-time function, the derivative is . If ff is a graph of revenue vs price, the derivative is “marginal revenue.” If ff is a curve drawn on paper, the derivative is the tangent’s slope at each point. Same operation; different physical or geometric interpretation depending on what was on the axes.

The standard pattern: derivative of xnx^n is nx(n1)n·x^(n−1). The widget shows it for n=2n = 2: drag the anchor and read off 2a2a. The same machinery, applied to sinx\sin x, gives cosx\cos x; applied to exe^x, gives exe^x back; applied to a constant, gives 0. The names of these — “differentiation rules” — are bookkeeping. The single underlying operation is the limit of secant slopes.

4

Differentiate twice — acceleration

The derivative of a function is itself a function. You can differentiate it again. For f(t)=t2f(t) = t²: f(t)=2tf'(t) = 2t (a line); f(t)=2f''(t) = 2 (a constant). Two derivatives of position give — the rate of change of velocity, which for free fall is the constant g−g.

That tower — position, velocity, acceleration — is the entire content of “Newton’s second law” once you have the derivative as a tool: force is mass times the second derivative of position. Most introductory physics is the algebraic and geometric consequences of this single fact.

5

Where this shows up — same operation, two pillars

A derivative is not just a slope. It is local change: how one quantity responds when another is nudged. The same operation shows up under different names in different pillars; the algebra stays the same.

physics : position changes into velocity; velocity into acceleration;
        forces decide those changes.
ml      : loss changes when a parameter moves; the gradient tells
        which way the loss falls.
finance : a price changes when a rate moves; the derivative measures
        the sensitivity — duration for bonds, the Greeks for options.

Projectile motion — the equations x(t)=v0cosθtx(t) = v₀ \cos θ · t and y(t)=v0sinθt12gt2y(t) = v₀ \sin θ · t − \tfrac{1}{2} g t² have derivatives vx=v0cosθvₓ = v₀ \cos θ (constant) and vy(t)=v0sinθgtv_y(t) = v₀ \sin θ − g t (linear). Differentiating the position gives the velocity directly; one more derivative gives the constant g−g acceleration.

The pendulum clock — the equation of motion θ¨=(g/L)sinθ\ddot{θ} = −(g/L) \sin θ is two derivatives of θ on the left, equals a function of θ on the right. Replace sinθ\sin θ with θθ (the linearization trick) and you get θ¨=(g/L)θ\ddot{θ} = −(g/L) θ, which the derivative tool can recognize and solve.

Terminal velocity — the differential equation dv/dt=gkvdv/dt = g − k v is one derivative on the left, the net force per unit mass on the right. The — the terminal speed — is found by setting the derivative to zero: gkvt=0g − k v_t = 0. Force balance is exactly “the derivative vanishes here.”

Damped oscillator — the equation x¨+2γx˙+ω02x=F(t)\ddot{x} + 2γ\dot{x} + ω₀² x = F(t) runs two derivatives on the left: a velocity term and an acceleration term. is exactly the term that multiplies the first derivative; is the place where forcing matches the second-derivative geometry. The whole page is the derivative ladder, twice.

Gradient descent — the loss L(w)L(w) is a multivariable function; its L∇L is the vector of partial derivatives. The descent step wwηLw ← w − η · ∇L uses one derivative per parameter, all at once. The rate at which the loss falls is L2\|∇L\|² — a derivative-derived quantity that determines whether learning is making progress.

Present value — the same operation, in finance vocabulary. The price of a bond is PV(r)=0Tc(t)ertdtPV(r) = \int_0^T c(t) · e^{-rt} dt, and the modified duration D=(PV/r)/PVD = -(\partial PV / \partial r) / PV is a derivative, normalized — it predicts how much the bond price moves when rates shift. Option Greeks are the same family: Δ=Price/S\Delta = \partial \text{Price} / \partial S, Γ\Gamma the second derivative, and so on. Financial derivative and mathematical derivative are the same word for a reason.

Same idea, different nouns: velocity in physics, gradient in ML, sensitivity in finance. In all three the derivative is the local rule for change.

# Position → velocity → acceleration. Same machine, applied twice.
# Projectile: y(t) = v₀ sin θ · t − ½ g t²    (from /physics/projectile-motion)
#   dy/dt = v₀ sin θ − g t                    (vertical velocity)
#   d²y/dt² = −g                              (vertical acceleration — constant)
#
# Pendulum (small-angle): θ(t) = θ₀ cos(ω t),  ω = √(g/L)
#   dθ/dt  = −θ₀ ω sin(ω t)
#   d²θ/dt² = −θ₀ ω² cos(ω t) = −ω² · θ(t)   ← simple harmonic motion
#
# Each application's equation of motion is one or two derivatives applied
# to the position function. The derivative is the shared tool.

Δ knows the interval. d knows the instant. The derivative is what Δ becomes when the interval shrinks to nothing — and the limit that survives is everything physics calls a rate.

exercises · 손으로 풀기
1average rate by handno calculator

For f(t)=t2f(t) = t², compute the average rate of change over the interval [1,3][1, 3]. Then over [1,1.1][1, 1.1]. Then over [1,1.001][1, 1.001]. What pattern do you see?

2instantaneous rate at t = 3no calculator

For f(t)=t2f(t) = t², derive f(3)f'(3) from the secant-slope definition. Show every step of the algebra.

3tangent slope from the curve

The widget shows y=x2y = x². Without using the formula, just by sliding a and shrinking h, read off the tangent slope at x=0.5x = 0.5 and at x=1.5x = 1.5. State a rule that fits both numbers.

4differentiate the projectileno calculator

The vertical position of a projectile is y(t)=v0sinθt12gt2y(t) = v₀ \sin θ · t − \tfrac{1}{2} g t². Use the same (f(a+h)f(a))/h(f(a+h) − f(a)) / h recipe to derive vy(t)=dy/dtv_y(t) = dy/dt. Confirm the answer matches the projectile widget’s v_y readout.

glossary · used on this page · 10
secant·할선
A straight line passing through two points on a curve. Its slope is the _average_ rate of change of the function over the interval between those two points: `(f(b) − f(a)) / (b − a)`. The secant is the quantity you can compute without taking any limit — two points, one division. The tangent is what the secant becomes when the two points are forced to merge.
derivative·미분
The _instantaneous_ rate of change of a function at a point — defined as the limit of secant slopes as the interval between the two sample points shrinks to zero: `f'(a) = lim_{h→0} (f(a+h) − f(a)) / h`. The derivative of position is velocity; of velocity, acceleration; of mass-with-respect-to-time, mass flow. Geometrically, the slope of the tangent line. Algebraically, the operation that turns `x²` into `2x` and `sin x` into `cos x`. Almost every quantity called a _rate_ anywhere in physics, ML, and engineering is some derivative.
tangent line·접선
A straight line that touches a curve at a single point and matches the curve's direction there. Its slope at the point of contact equals the derivative of the function at that point: `m_tangent = f'(a)`. The tangent is what the secant becomes in the limit as its two intersection points merge — the curve's _instantaneous direction_ made visible as a line. Distinct from the trigonometric tangent; same word, different concept.
velocity·속도
The rate of change of position. A vector — both a magnitude (speed) and a direction. In one dimension, `v = dx/dt`; in two, the velocity vector has components `(dx/dt, dy/dt)`. Velocity that does not change in time is called _uniform_; the position then increases linearly. Velocity that changes corresponds to nonzero acceleration. The distinction between speed (a number) and velocity (a vector) is the difference between "fast" and "fast in which direction."
acceleration·가속도
The rate of change of velocity. Also a vector. In one dimension, `a = dv/dt = d²x/dt²`; constant acceleration produces velocity that is linear in time and position that is quadratic. Newton's second law `F = ma` says force and acceleration are proportional, with mass as the conversion factor. On Earth's surface, near the ground, the acceleration of any object in free fall is approximately constant — a single number that does not depend on the object's mass.
simple harmonic motion·단순조화운동
The motion governed by `ẍ = −ω²x` for some constant `ω > 0`. Its solutions are sinusoidal: `x(t) = A cos(ωt + φ)`, with constant period `T = 2π/ω` independent of amplitude `A`. A spring obeying Hooke's law produces it exactly. A pendulum produces it _only_ in the small-angle limit, after `sin θ` has been replaced by `θ`. Most "oscillator" intuitions in physics, engineering, and signal processing live inside this single equation.
fixed point·고정점
A value that stays put when fed through a function or rule. For an iteration `x ← f(x)`, a fixed point `x*` satisfies `f(x*) = x*` — apply the rule and nothing changes. For a differential equation `dx/dt = g(x)`, a fixed point is where `g(x*) = 0` and the system stops evolving. _Equilibrium_ in physics, _minimum_ in optimization, and _steady state_ in numerical methods all name the same object: the place where the next step lands you back on yourself. Two flavours matter — _stable_ (small disturbances die out, system returns) and _unstable_ (small disturbances grow, system escapes); the difference is the sign of the local derivative `g'(x*)`.
damping·감쇠
Any mechanism that drains energy from an oscillator and brings it eventually to rest. The simplest model is _velocity-proportional_ — a force `−c · ẋ` opposing motion — which makes the equation `m ẍ + c ẋ + k x = 0` linear and easy to solve. Three regimes by how `c` compares to `√(4mk)`: _underdamped_ (oscillates while shrinking), _critically damped_ (returns to rest fastest with no oscillation), _overdamped_ (returns slowly without oscillation). Damping is more general than _drag_ — drag is one specific source (fluid resistance), while damping covers friction, viscous flow, electrical resistance, structural hysteresis, anything that takes the oscillator's energy and never gives it back.
resonance·공명
What happens when an external driving force pushes an oscillator at (or near) its _natural frequency_. Each push lands when the oscillator is already moving in the same direction — energy goes in coherently, amplitude grows. Off-resonance, pushes alternately add and subtract energy, and the response stays small. The steady-state amplitude as a function of driving frequency `ω` is `A(ω) = F / √((ω₀² − ω²)² + (2γω)²)`, which peaks near `ω = ω₀` for light damping. _Lighter damping → sharper, taller peak._ The same identity governs an opera singer breaking glass, the Tacoma Narrows bridge collapse, MRI nuclei flipping, and the radio dial picking out one station from a sky full. Resonance is _not_ a special force; it is energy injected on the beat the system already keeps.
gradient·그래디언트
The slope of a function in many directions at once. For a function f(x, y, z, ...), the gradient is the vector of partial derivatives — it points in the direction of steepest ascent. In machine learning, this vector tells the optimizer which way to step the parameters to reduce loss.
⚠ In ML: the vector of partial derivatives of the loss with respect to every parameter. Optimization moves opposite the gradient ("gradient descent"). If a value used in the gradient becomes 0 from underflow, the entire chain collapses — that's why log-space matters.