What is a curve doing right now?
A moving point leaves a trail. The derivative is not the trail. It is the arrow the trail wants to become at this instant. Average speed knows two points and divides; instantaneous speed knows only one point, but somehow has a number anyway. That number — what every rate in physics, ML, and engineering ultimately is — comes from forcing the secant line through two points to collapse onto the tangent line at one.
In the widget below: the orange secant through two points on . Drag h toward 0 and the secant rotates onto the brown tangent. The slope it converges to is the derivative at that point — the curve’s instantaneous direction, made into a number.
- what
The instantaneous rate of change. . The arrow the trail wants to be at this instant — secant slopes squeezed onto a single tangent slope as the second point collapses into the first.
- applies when
Wherever a rate shows up. Velocity (position → speed), gradient descent (loss → step direction), small-angle approximation (sin θ ≈ θ near 0), local linearization in physics, ML, control. One machine, four names: slope, velocity, rate, gradient.
- breaks when
The limit must exist. Sharp corners (|x| at 0), jumps (step functions), and pathological wiggles (Weierstrass nowhere-differentiable function) have no derivative. In ML, hidden non-differentiabilities — ReLU at 0, max-pooling, indicator functions — are patched by convention, not by mathematics; the gradient you compute through them is a choice, not a fact.
Δ over Δ — average rate
You drove from at to at . Your average speed is . That number describes the trip as a whole — but it does not tell you what you were doing at . The line through and on the position graph is a
# Average rate of change — the secant slope. Two points, one division.
def average_rate(f, a, b):
return (f(b) - f(a)) / (b - a)
f = lambda t: t * t # x(t) = t² — toy "position"
average_rate(f, 1, 3) # → 4.0 (position went 1 → 9 in 2 seconds)Shrink the interval — instantaneous rate
“What was I doing at ?” requires the secant’s two points to merge into one. Pick a fixed anchor and a small interval ; compute as shrinks. For :
(f(2+h) − f(2)) / h = ((2+h)² − 4) / h = (4 + 4h + h² − 4) / h = 4 + h ← independent of how big h is, except for the +h tail
As , the expression converges to . Not “approaches” in some hand-wavy sense — the value is , and can be made as small as you want. The number that survives in the limit, 4, is the
# Instantaneous rate — shrink the interval and watch the secant slope
# converge to the tangent slope. No epsilon-delta; just shrink and look.
def secant_slope(f, a, h):
return (f(a + h) - f(a)) / h
[secant_slope(f, 3, h) for h in (1, 0.1, 0.01, 0.0001)]
# → [7.0, 6.1, 6.01, 6.0001]
# The pattern: 6 + h. The limit as h → 0 is 6.
# That's f'(3) for f(t) = t². In general, f'(t) = 2t.One machine, three names — slope, velocity, rate
The recipe — pick two points, compute the secant slope, shrink the interval, take the limit — gives the same kind of number whatever you plug in. If is a position-vs-time function, the derivative is
The standard pattern: derivative of is . The widget shows it for : drag the anchor and read off . The same machinery, applied to , gives ; applied to , gives back; applied to a constant, gives 0. The names of these — “differentiation rules” — are bookkeeping. The single underlying operation is the limit of secant slopes.
Differentiate twice — acceleration
The derivative of a function is itself a function. You can differentiate it again. For : (a line); (a constant). Two derivatives of position give
That tower — position, velocity, acceleration — is the entire content of “Newton’s second law” once you have the derivative as a tool: force is mass times the second derivative of position. Most introductory physics is the algebraic and geometric consequences of this single fact.
Where this shows up — same operation, two pillars
A derivative is not just a slope. It is local change: how one quantity responds when another is nudged. The same operation shows up under different names in different pillars; the algebra stays the same.
physics : position changes into velocity; velocity into acceleration;
forces decide those changes.
ml : loss changes when a parameter moves; the gradient tells
which way the loss falls.
finance : a price changes when a rate moves; the derivative measures
the sensitivity — duration for bonds, the Greeks for options.Projectile motion — the equations and have derivatives (constant) and (linear). Differentiating the position gives the velocity directly; one more derivative gives the constant acceleration.
The pendulum clock — the equation of motion is two derivatives of θ on the left, equals a function of θ on the right. Replace with (the linearization trick) and you get —
Terminal velocity — the differential equation is one derivative on the left, the net force per unit mass on the right. The
Damped oscillator — the equation runs two derivatives on the left: a velocity term and an acceleration term.
Gradient descent — the loss is a multivariable function; its
Present value — the same operation, in finance vocabulary. The price of a bond is , and the modified duration is a derivative, normalized — it predicts how much the bond price moves when rates shift. Option Greeks are the same family: , the second derivative, and so on. Financial derivative and mathematical derivative are the same word for a reason.
Same idea, different nouns: velocity in physics, gradient in ML, sensitivity in finance. In all three the derivative is the local rule for change.
# Position → velocity → acceleration. Same machine, applied twice.
# Projectile: y(t) = v₀ sin θ · t − ½ g t² (from /physics/projectile-motion)
# dy/dt = v₀ sin θ − g t (vertical velocity)
# d²y/dt² = −g (vertical acceleration — constant)
#
# Pendulum (small-angle): θ(t) = θ₀ cos(ω t), ω = √(g/L)
# dθ/dt = −θ₀ ω sin(ω t)
# d²θ/dt² = −θ₀ ω² cos(ω t) = −ω² · θ(t) ← simple harmonic motion
#
# Each application's equation of motion is one or two derivatives applied
# to the position function. The derivative is the shared tool.Δ knows the interval. d knows the instant. The derivative is what Δ becomes when the interval shrinks to nothing — and the limit that survives is everything physics calls a rate.
For , compute the average rate of change over the interval . Then over . Then over . What pattern do you see?
For , derive from the secant-slope definition. Show every step of the algebra.
The widget shows . Without using the formula, just by sliding a and shrinking h, read off the tangent slope at and at . State a rule that fits both numbers.
The vertical position of a projectile is . Use the same recipe to derive . Confirm the answer matches the projectile widget’s v_y readout.