Lemma
math, backwards
the hook · two risks, one safer mix

Why does combining two risky assets sometimes make them safer?

Two stocks, each with the same average return and the same volatility. Hold one — risky. Hold the other — equally risky. Hold half of each — and somehow the mix can be less risky than either alone. Sometimes a lot less. The mechanism isn’t a fluke or a fee structure; it’s a single cross-term in a quadratic, and that quadratic is the same one vectors writes for the squared length of a sum.

Risk doesn’t add — it composes through correlation. Risk starts with a of possible returns, not a single return. Finding the minimum-variance weights is optimization with a closed-form answer for two assets and an iterative one in higher dimensions.

Widget — Portfolio mixer
ρ0.00
weight w (A)50.0%
σ_p (current)0.707
σ_p (min @ w*)0.707
joint returns · A vs B
A →B ↑
portfolio variance σ²_p as a function of w
0.250.500.751.000.00.51.0w (fraction in A)
w* = 0.50
Two assets, both with mean return 5% and standard deviation σ = 1. The mean of any mix is the same 5% — but the variance isn't. The dashed line is the no-mix reference (σ² = 1); the green dot is the minimum-variance weight w*; the orange dot is the mix you've chosen. At ρ = +1 the curve is a flat line at 1 — mixing buys nothing. At ρ = 0 the curve sags to 0.5 at w = 0.5. At ρ = −1 it touches 0 — risk disappears entirely. Same expected return, wildly different risk; the cross-term 2 w (1−w) ρ σ_A σ_B is the lever.
the arc
1

Risk as variance — what 'risky' actually measures

In finance, risk is shorthand for how much the outcome jitters around its average, not how much you might lose worst-case. The standard tool is : Var(r)=E[(rμ)2]Var(r) = E[(r − μ)²] — the expected squared deviation. Its square root, the standard deviation σσ, has the same units as the return and is what most people mean by “volatility.” Doubling σ is “twice as risky” in this sense; halving it is “half as risky.” It’s an imperfect proxy (a chart that crashes once doesn’t have the same character as one that wobbles every day, even at matched σ), but it’s the coordinate that quadratic math lives in. Modern Portfolio Theory, capital asset pricing, Sharpe ratios — all sit on top of variance.

2

Two assets, four numbers

Pick two assets, A and B. Each has a mean return and a variance: μA,σA2,μB,σB2μ_A, σ_A², μ_B, σ_B². Mix them in proportion ww in A and 1w1−w in B. The mean of the mix is the easy part — averaging:

μp=wμA+(1w)μBμ_p = w μ_A + (1−w) μ_B

If you only cared about means, mixing assets with the same μ is pointless: you get the same μ. But the variance of the mix has three terms, not two:

σp2=w2σA2+(1w)2σB2+2w(1w)Cov(A,B)σ_p² = w² σ_A² + (1−w)² σ_B² + 2 w (1−w) Cov(A, B)

The first two are what you’d guess: each asset’s own variance, scaled by its share squared. The third — the — is the surprise. It’s the cross-term, and it’s where mixing earns or loses you something. When A and B move together (positive covariance), mixing piles risk on. When they move opposite (negative covariance), mixing cancels risk.

import numpy as np

# Two assets: same mean return 5%, same standard deviation σ = 1.
# We'll compute portfolio mean and variance for any (weight, correlation).
mu_A, mu_B = 0.05, 0.05
sig_A, sig_B = 1.0, 1.0

def portfolio(w, rho):
    """w: fraction in A; (1-w) in B. rho: correlation of returns."""
    mu  = w * mu_A + (1 - w) * mu_B
    var = (w**2 * sig_A**2
         + (1 - w)**2 * sig_B**2
         + 2 * w * (1 - w) * rho * sig_A * sig_B)
    return mu, var

# Same expected return regardless of weight (because mu_A == mu_B).
# Variance is what moves.
[(rho, *portfolio(w=0.5, rho=rho)) for rho in (1.0, 0.0, -1.0)]
# → [(1.0,  0.05, 1.00),    perfect lockstep — no diversification benefit
#    (0.0,  0.05, 0.50),    independent — variance halves at 50/50
#    (-1.0, 0.05, 0.00)]    perfect anti — risk literally vanishes
3

Correlation — covariance, normalized

Covariance has awkward units (the product of A’s units and B’s units), so people quote the unit-free version: ρ=Cov(A,B)/(σAσB)ρ = Cov(A, B) / (σ_A σ_B), which lives in [1,1][−1, 1]. Now the variance formula reads:

σp2=w2σA2+(1w)2σB2+2w(1w)ρσAσBσ_p² = w² σ_A² + (1−w)² σ_B² + 2 w (1−w) ρ σ_A σ_B

Three regimes. ρ=+1ρ = +1 — perfect lockstep — and σpσ_p is just the weighted average of σAσ_A and σBσ_B. Mixing buys nothing. ρ=0ρ = 0 — independent — and the cross-term vanishes; σp2σ_p² falls strictly below the weighted average of the variances. ρ=1ρ = −1 — perfect anti-lockstep — and the cross-term is negative, large; with σA=σBσ_A = σ_B at w=1/2w = 1/2 the variance hits exactly zero. Risk doesn’t just shrink; it disappears. The widget above is a one-line dial through these three regimes.

# Minimum-variance weight, closed form.
# Take d/dw of σ²_p(w) and set to zero:
#   w* = (σ_B² − ρ σ_A σ_B) / (σ_A² + σ_B² − 2 ρ σ_A σ_B)
def min_var_weight(rho, sig_A=1.0, sig_B=1.0):
    num = sig_B**2 - rho * sig_A * sig_B
    den = sig_A**2 + sig_B**2 - 2 * rho * sig_A * sig_B
    return num / den if abs(den) > 1e-12 else 0.5

[(rho, min_var_weight(rho)) for rho in (-1, -0.5, 0, 0.5, 0.99)]
# → [(-1,    0.50),    A and B equally weighted — perfect cancellation
#    (-0.5, 0.50),    still 50/50 because σ_A = σ_B
#    (0,    0.50),
#    (0.5,  0.50),
#    (0.99, 0.50)]
# With σ_A = σ_B the answer is always 0.5; differing σ's would tilt w*.
4

It's the same algebra as |a + b|² in vectors

The portfolio variance formula isn’t a finance invention — it’s the formula for the squared length of a vector sum, dressed in finance words. Treat returns as random vectors: each asset is one component of a 2D vector, and the portfolio is a weighted combination. The squared length identity from the vectors module:

a+b2=a2+b2+2ab|a + b|² = |a|² + |b|² + 2 a · b

Replace a2|a|² with w2σA2w² σ_A², b2|b|² with (1w)2σB2(1−w)² σ_B², and the dot product aba · b with w(1w)Cov(A,B)w (1−w) Cov(A, B) — and you’ve recovered portfolio variance. Covariance is just a dot product on random variables. The cosine of the angle between them is correlation. “Two assets at right angles” means uncorrelated; “two assets pointing the same way” means correlated; “two assets pointing opposite” means anti-correlated, and that pair cancels.

This is why the vectors module is consumed by physics, graphics, ML, and finance: the same two operations (add and scale) that move a Bezier control handle also build a portfolio. The applications don’t share theorems — they share the algebra.

5

The minimum-variance mix

Set d/dw[σp2]=0d/dw [σ_p²] = 0: a one-line calculus problem. The answer is

w=(σB2ρσAσB)/(σA2+σB22ρσAσB)w* = (σ_B² − ρ σ_A σ_B) / (σ_A² + σ_B² − 2 ρ σ_A σ_B)

When σA=σBσ_A = σ_B, this reduces to w=1/2w* = 1/2 for any ρρ: the safest mix of two equally-volatile assets is always half-and-half. When σAσBσ_A ≠ σ_B, the formula tilts: the lower-volatility asset gets more weight, but the tilt depends on ρρ too. The widget marks ww* as the green dot on the variance curve; drag ρρ and watch the curve flatten or sharpen around it. The full Modern Portfolio Theory generalization — N assets, an N×N covariance matrix, a quadratic to minimize subject to constraints — is just this same calculation written in vector-matrix form. One pair, three terms, one cross-term. Everything else is bookkeeping.

# Sanity check via Monte Carlo: simulate joint returns at given ρ,
# form the 50/50 portfolio, measure realized variance, compare to formula.
import numpy as np

def simulate(rho, n=200_000, seed=0):
    rng = np.random.default_rng(seed)
    z1 = rng.standard_normal(n)
    z2 = rng.standard_normal(n)
    a = mu_A + sig_A * z1
    b = mu_B + sig_B * (rho * z1 + np.sqrt(1 - rho**2) * z2)
    p = 0.5 * a + 0.5 * b
    return p.mean(), p.var()

formula = portfolio(w=0.5, rho=-0.5)
sim     = simulate(rho=-0.5)
formula, sim
# → ((0.05, 0.25), (0.0501, 0.2503))
# Closed form and simulation agree to three decimals.
# The whole MPT machinery from Markowitz onward sits on top of this single
# scalar identity; the rest is generalizing to N assets and adding
# constraints (no shorts, sector caps, etc.).
now break it

ρ = 0 doesn’t mean independent — it means linearly uncorrelated. A and B can be tightly bound by a non-linear relationship (e.g., B = A²) and still report ρ ≈ 0 over a symmetric range. The portfolio variance formula uses ρ directly, so it underestimates risk in this case; pre-2008 quant strategies that ranked pairs by historical ρ alone learned this expensively. Variance is a second-moment quantity; everything else lives in the higher moments and the copula. The math here is exact under the assumptions; the assumptions are where the danger hides.

Risk doesn’t add — it composes through correlation. Portfolio variance is w2σA2+(1w)2σB2+2w(1w)ρσAσBw² σ_A² + (1−w)² σ_B² + 2 w (1−w) ρ σ_A σ_B — three terms, one cross-term. The cross-term is what mixing buys you, and it can be negative. That’s why two risky things can be safer together than either alone.

exercises · 손으로 풀기
1compute by hand · the 50/50 meanno calculator

Asset A has mean return μA=8%μ_A = 8\%; asset B has μB=12%μ_B = 12\%. You hold the 50/50 mix. What is the portfolio’s mean return? Now you tilt to 25% A, 75% B — what changes?

2ρ = 1 · why mixing doesn't helpno calculator

A and B both have σ=1σ = 1, and ρ=1ρ = 1 (perfect lockstep). Compute σpσ_p at w=0.5w = 0.5. Compare to w=0w = 0 (all B) and w=1w = 1 (all A).

3ρ = −1 · the magic cancellation

Same σ = 1 each, now ρ=1ρ = −1. Show that there exists a ww at which σp=0σ_p = 0 exactly. Find the value of ww. Why can’t real portfolios achieve this?

4the evil one · maximize return only

Asset A: μA=12%μ_A = 12\%, σA=0.20σ_A = 0.20. Asset B: μB=8%μ_B = 8\%, σB=0.10σ_B = 0.10, ρ=0.0ρ = 0.0. A “common sense” investor says: A has the higher expected return and the same horizon — go all-in on A. Show why a mix with some B can give a better risk-adjusted return (lower σ per unit μ above the riskless rate). Compute σ_p for w = 0.5 and compare to σ_A.

why this isn't taught this way

Finance courses introduce portfolio variance as a new identity, with a covariance matrix that arrives without provenance. Linear-algebra courses introduce dot products and squared lengths without mentioning that a dot product is what makes a portfolio more than the sum of its parts. The two are the same piece of math. Lemma puts the formula a+b2=a2+b2+2ab|a + b|² = |a|² + |b|² + 2 a · b next to σp2=w2σA2+(1w)2σB2+2w(1w)Cov(A,B)σ_p² = w² σ_A² + (1−w)² σ_B² + 2 w (1−w) Cov(A, B) so the sameness is visible. Modern Portfolio Theory, the Sharpe ratio, the efficient frontier — all are this identity with more dimensions and constraints stacked on. None require new math, only patience with the bookkeeping.

glossary · used on this page · 4
distribution·분포
A _shape of uncertainty_: how probability is allocated across possible outcomes. A discrete distribution assigns a number to each outcome — `P(X = "cat") = 0.6, P(X = "dog") = 0.3, P(X = "bird") = 0.1`. A continuous distribution assigns _density_ across an interval — there is no probability at any single point, only over a range. The numbers must sum (discrete) or integrate (continuous) to 1, because _something_ must happen. A single probability is one number; a distribution is the whole shape behind it. Most quantities a model predicts, an asset can return, or a pixel can take, are not single numbers but distributions — and the _spread_ of those distributions is often what matters more than the center.
variance·분산
The expected square-distance of a random quantity from its mean. For returns `r` with mean `μ`, `Var(r) = E[(r − μ)²]`; the units are the _square_ of whatever you started with, which is why people usually quote the square root, _standard deviation_, instead. Variance is the workhorse measure of _risk_ in finance: not "how much could I lose worst-case?" but "how much does the outcome jitter from the average?". Two assets with the same mean return can have wildly different variances, and that gap is the entire reason a portfolio is more than the sum of its parts.
covariance·공분산
A two-variable cousin of variance. For two random quantities `X` and `Y`, `Cov(X, Y) = E[(X − μ_X)(Y − μ_Y)]`: the expected product of their joint deviations. Positive when they tend to move together (both above-average together, both below-average together), negative when they move apart, zero when their joint motion averages out. Variance is just `Cov(X, X)` — a special case. Covariance is what couples two assets in a portfolio: the cross-terms in the variance formula are `2 w_A w_B Cov(A, B)`, and that's the door diversification walks through.
correlation·상관계수
Covariance, normalized to live on `[−1, 1]`. `ρ(X, Y) = Cov(X, Y) / (σ_X · σ_Y)`. The advantage over raw covariance: it's _unit-free_ and _scale-free_, which makes "more correlated" mean the same thing whether you're comparing daily returns or yearly returns. `ρ = 1`: the two variables move in lockstep (same direction, fixed ratio). `ρ = -1`: perfect anti-lockstep. `ρ = 0`: linearly uncorrelated, though _not_ necessarily independent — correlation only sees the linear part of joint motion. The whole point of diversification is to find `ρ < 1` pairs.