the hook · loss you can't see

JPEG throws information away. Why does the picture still look like a picture?

The previous page closed at the lossless wall: histogram entropy is a hard floor; even a structure-aware coder can’t push below it. JPEG cheats — but the cheat isn’t sloppy. It picks which information to throw away based on what humans don’t notice, then compresses the rest. The result is a 4 MB photograph that fits in 400 KB without looking damaged. The mathematics is one change of basis followed by aggressive rounding.

Lossy compression isn’t compression — it’s prioritization.

Widget — JPEG block lab

kept coefs8 / 64

compression (rough)8.0×

mean abs error0.7

max error2

original 8×8

DCT · kept (color), zeroed (gray)

reconstructed

kept coefficients8

Three views, one block. Left: original pixels, the standard view. Middle: DCT coefficients, the same data in a basis where natural images are sparse — the DC term (top-left) carries the average; high frequencies (bottom-right) are usually small. Right: what you get back if you keep only the top K coefficients (by magnitude) and run the inverse DCT. K = 64 is exact (DCT is lossless when you keep everything). For the flat block, even K = 1 is exact — the block lives entirely on one basis vector. For the texture, you can drop most coefficients before the eye notices. For the checkerboard, the energy is concentrated near the high-frequency corner, and dropping it produces visible damage. *Same data, different basis, different lossiness profile* — the page in one widget.

the arc

Cut the image into 8×8 blocks

JPEG starts by chopping the image into independent 8×8 pixel tiles. Why blocks? Two reasons. Locality: real pictures aren’t statistically uniform across the whole frame — a face has different structure from sky. Working in small windows lets the coder adapt without modeling global structure. Tractability: an 8×8 block is 64 numbers, small enough that direct algebra (an 8×8 transform) is fast and exact. The trade-off is blockiness — at low quality, the 8×8 tile boundaries become visible because each block was rounded independently. The widget’s “checkerboard” preset is one block; everything below scales to the rest of the picture by repetition.

See the block in a different basis

A 64-entry pixel block is a vector in 64-dimensional space. The standard basis is “one direction per pixel position”: coordinate $(0, 0)$ means “the pixel at the top-left.” It’s the obvious basis but it’s not the only one. Any set of 64 linearly independent directions is also a basis; the same block has different coordinates in each.

Crucially: switching basis is lossless. The information in the block doesn’t change; only the labels do. If the new basis is orthonormal, switching is just a matrix multiplication, and switching back is multiplying by the transpose. So the question is: is there a basis that’s better than pixels for what JPEG wants to do?

DCT — a basis where natural images are sparse

JPEG’s chosen basis is the discrete cosine transform (DCT). Each of its 64 directions is a 2D cosine pattern at a particular spatial frequency. The $(0, 0)$ direction is constant — it carries the block’s average brightness, called the DC term. The $(7, 7)$ direction is a fast-varying checkerboard — it carries the highest frequency the block can represent. In between, smooth patterns sit in the low-frequency corner; sharp edges and noise sit in the high-frequency corner.

The empirical claim that makes JPEG work: natural images are sparse in the DCT basis. Most of the energy in any given 8×8 block of a typical photograph concentrates in maybe 5–10 of the 64 DCT coefficients, almost always the low-frequency ones. The widget makes this readable. Toggle to gradient: nearly all energy at $(0, 0)$ and a couple of immediate neighbors. Toggle to flat: literally one coefficient (the DC term) carries everything. Toggle to checkerboard: an artificial extreme — energy concentrates at a single high-frequency cell, but the sparsity is still there.

import numpy as np

# 8x8 DCT-II in matrix form. The cosine matrix M is the same one JPEG uses;
# applying it twice (rows then columns) gives the 2D DCT.
N = 8

def dct_matrix(N=8):
    M = np.zeros((N, N))
    for k in range(N):
        for n in range(N):
            M[k, n] = np.cos((2*n + 1) * k * np.pi / (2*N))
    M[0, :] *= 1 / np.sqrt(N)
    M[1:, :] *= np.sqrt(2 / N)
    return M

M = dct_matrix(N)

def dct2d(block):
    return M @ block @ M.T   # rows, then columns

def idct2d(coef):
    return M.T @ coef @ M    # inverse: just transpose

# DCT itself is lossless. Round-trip an 8x8 block and the error is zero
# (up to floating point).
block = np.random.default_rng(0).integers(0, 256, size=(8, 8)).astype(float)
coef  = dct2d(block)
back  = idct2d(coef)
np.allclose(block, back)   # → True   the transform alone loses nothing

Quantization — drop what doesn't matter

The compression happens here. After the DCT, each coefficient is divided by an integer from a quantization table and rounded to the nearest whole number. The quantization table is hand-tuned (and standardized) to divide more aggressively in high-frequency cells than in low-frequency ones, because the human visual system is less sensitive to high-frequency error. Small high-frequency coefficients round straight to zero; the kept coefficients lose precision but survive.

The widget uses a simplified version: keep the top $K$ coefficients by magnitude, zero the rest. Real JPEG quantization is per-coefficient with a fixed table per quality setting, but the qualitative effect is identical. Drop the slider to $K = 4$ on the texture preset and you’ll see the reconstruction is still recognizable — most of what you saw was carried by those four numbers. Drop to $K = 1$ and you get a flat block at the average brightness; the reconstruction has zero error on the flat preset because flat was one number’s worth of information.

This is also where lossy quantization earns its name and its danger: rounding is a one-way operation. The original coefficient cannot be recovered from the rounded value. The transform is invertible; the rounding is not.

# Keep top K coefficients by magnitude; zero the rest. JPEG's quantization
# step is more elaborate (a per-coefficient divisor table), but the
# qualitative effect — kill small / high-frequency entries — is the same.
def keep_top_k(coef, k):
    flat = coef.flatten()
    if k >= flat.size: return coef.copy()
    threshold = np.sort(np.abs(flat))[-k]
    out = coef.copy()
    out[np.abs(out) < threshold] = 0
    return out

def reconstruct(coef, k):
    return idct2d(keep_top_k(coef, k))

# Compare four block types: how many of 64 coefficients does each one need?
def kept_to_target_error(block, target_mae=2.0):
    coef = dct2d(block)
    for k in range(1, 65):
        err = np.mean(np.abs(reconstruct(coef, k) - block))
        if err <= target_mae:
            return k, err
    return 64, np.mean(np.abs(reconstruct(coef, 64) - block))

# (assumes flat / gradient / texture / checker block builders defined elsewhere)
[(name, *kept_to_target_error(b()))
 for name, b in (("flat", lambda: np.full((8,8), 128.0)),
                 ("gradient", lambda: np.add.outer(*[np.linspace(0,255,8)]*2) / 2),
                 ("checker", lambda: 130 + 100*((np.indices((8,8)).sum(0) % 2)*2 - 1)))]
# → [('flat',     1, 0.0),    DC alone reconstructs perfectly
#    ('gradient', 3, ~1.5),   a handful of low-freq entries
#    ('checker',  1, 0.0)]    surprisingly: ALL energy at one high-freq cell
# Same data, very different sparsity in the DCT basis.

Reconstruct — inverse DCT brings the picture back

To decode, JPEG runs the inverse DCT on the (now mostly zero) coefficient grid. The inverse is the same matrix machinery as the forward DCT, just with the cosine matrix transposed. The output is no longer the original block — it’s a projection of the original onto the subspace spanned by the kept basis vectors. That projection is the closest approximation to the original under the L² metric, given that you’re only allowed to use the kept directions.

Two failure modes show up here. Blocking: each 8×8 tile was rounded independently, so adjacent tiles can disagree along their shared edge. Ringing: dropping high-frequency coefficients near a sharp edge causes oscillations because the remaining basis vectors can’t reproduce a step function. Both are visible at low quality settings. They’re the price of the trade.

Entropy coding — the final wrap

After quantization, each block is a stream of integers, mostly zero, with the kept values arranged in a zigzag scan order (low-frequency first). That stream goes into Huffman or arithmetic coding — the entropy module’s shannon-bound business — and that’s where the file actually shrinks on disk. JPEG’s contribution isn’t the entropy coder; that’s standard machinery. JPEG’s contribution is producing a stream the entropy coder can pack tightly. Long runs of zeros compress to almost nothing; small integers carry few bits each.

So the file size has three multiplicative savings: fewer non-zero coefficients (most rounded to zero), smaller magnitudes for the kept ones, and runs of zeros that entropy-code beautifully. The page in three bullets:

Change basis (DCT) so the picture’s information concentrates in a few coordinates.
Quantize (round) the small coordinates to zero — that’s the lossy step.
Entropy-code (Huffman) the resulting sparse integer stream — that’s where the bytes are saved.

# Why the file actually shrinks: after quantization, the coefficient
# stream has lots of zeros and small ints; entropy coding (Huffman or
# arithmetic) packs that stream tightly. Same entropy module that bounds
# tf-idf and the lossless image-compression page — JPEG just feeds it a
# stream that's already been pre-sparsified by DCT + quantization.
from collections import Counter
from math import log2

def entropy(symbols):
    counts = Counter(symbols)
    N = len(symbols)
    return sum(-(c / N) * log2(c / N) for c in counts.values() if c > 0)

# Pretend a 256-block image. Compare the entropy of the raw pixel stream
# to the entropy of the kept-DCT-coefficient stream after rounding.
rng = np.random.default_rng(1)
img = rng.integers(50, 200, size=(8, 32))   # 8 high × 32 wide = 32 blocks of 8x8

# This is illustrative; real JPEG quantizes per coefficient (zigzag table).
raw_h = entropy(img.flatten().tolist())
print(f"raw pixel H ≈ {raw_h:.2f} bits/symbol")

# After DCT + top-8-of-64 + integer rounding, most symbols are zero.
coef_stream = []
for bj in range(4):
    block = img[:, bj*8:(bj+1)*8].astype(float)
    kept = keep_top_k(dct2d(block), k=8)
    coef_stream.extend(np.round(kept).astype(int).flatten().tolist())
sparse_h = entropy(coef_stream)
print(f"kept-8 DCT stream H ≈ {sparse_h:.2f} bits/symbol")
# Typical run: raw ~7 bits/symbol, sparse-DCT ~1-2 bits/symbol.
# Same entropy bound, very different alphabet — the gap is what JPEG
# saves in file size on top of what discarding coefficients already saved.

now break it

Two images can have the same DCT-coefficient histogram (same multiset of values) and very different perceptual quality after quantization, because perceptual quality depends on which cell a coefficient sits in — high-frequency error is hidden, low-frequency error is glaring. Histogram entropy can’t tell them apart; the human eye can. JPEG’s quantization table encodes this asymmetry: smaller divisors for low-frequency cells, larger for high-frequency. The entropy of the rounded stream tells you the file size; the quantization table tells you the perceptual quality. They are different objectives, both stacked on the same DCT basis.

Lossy compression isn’t compression — it’s prioritization. JPEG changes basis (DCT) so the picture becomes sparse, throws away the coordinates that don’t matter (quantization), and Huffman-packs the rest (entropy coding). Three steps, one savings: fewer coefficients, smaller values, longer zero runs. The math doesn’t beat entropy — it picks a different alphabet.

exercises · 손으로 풀기

1flat block · one coefficientno calculator

In the widget, pick the flat preset and slide $K$ down to 1. The reconstruction is still exactly the original. Why does keeping just one coefficient suffice? Which coefficient is it, and what does it carry?

2checkerboard · high-frequency energy

In the widget, pick checkerboard and look at the DCT panel. Most of the energy concentrates at one cell. Where? Why is it that cell, and what does it tell you about how JPEG would handle a real-image patch full of fine texture?

3texture · which coefficients go first

In the widget, pick texture and slide $K$ from 64 down to 1. The reconstruction degrades as $K$ falls. Which coefficients are dropped first, and why does that match what JPEG actually does?

4the evil one · same entropy, different quality

Two compressed images of the same scene end up with byte streams that have identical entropy. One looks fine; the other has visible blocky artifacts. How is that possible? In one sentence, distinguish what entropy bounds and what it doesn’t.

why this isn't taught this way

Image-processing courses introduce DCT and quantization as JPEG-specific machinery. Information theory courses introduce entropy and entropy coding as Shannon-specific. The bridge between them — change of basis is what makes the entropy bound survivable on real signals* — gets left implicit. Lemma puts the three steps (basis change, quantization, entropy coding) in one arc so the reader can see what each one does and what it doesn’t. The hard part of lossy compression isn’t the entropy coder (that’s standard) and isn’t the transform (that’s reversible). The hard part is the quantization table — the hand-tuned weights that decide which information humans don’t notice. That table is where psychophysics meets information theory, and it’s also where every codec since 1992 has competed.

glossary · used on this page · 4

lossless·무손실

A compression scheme that _throws nothing away_: decompressing the file gives back the exact original bits. PNG, ZIP, FLAC are lossless; the file is smaller because the encoder finds a more compact representation of the same information, not because it dropped any. The Shannon bound is the floor: no lossless scheme, no matter how clever, beats the entropy of the source. The opposite of _lossy_, which trades fidelity for size.

basis·기저

A set of _coordinate directions_ against which any vector in a space can be written as a unique combination of scalars. The same vector has different coordinates in different bases — the _thing_ doesn't change, only the _labels_ you give it. The standard pixel basis for an 8×8 image: 64 directions, one per pixel position. The DCT basis for the same block: 64 different directions, each a 2D cosine pattern at a particular spatial frequency. Both bases describe the same 64-dimensional space; choosing one over the other is a _change of perspective_, not a change of information. JPEG's whole trick is that _natural images have a sparser representation in the DCT basis than in the pixel basis_ — most of the energy concentrates in a few coefficients, which is what makes throwing the rest away survivable.

discrete cosine transform·이산 코사인 변환

A change of _basis_ for a finite signal — same data, different coordinates. Applied to an 8×8 image block, the DCT produces 64 coefficients indexed `(u, v)` from 0 to 7: the `(0, 0)` coefficient (the _DC_ term) carries the block's average brightness; coefficients toward `(7, 7)` carry progressively higher spatial frequencies. For natural images, most of the energy lives in the low-frequency corner, and the high-frequency entries are tiny. JPEG exploits exactly that: throw away the small high-frequency coefficients, keep the large low-frequency ones, and the picture survives with most of its perceptual content intact. The DCT itself is _lossless_ and _invertible_; the loss happens in the next step (quantization).

quantization·양자화

Replacing a continuous (or fine-grained) value with the nearest entry from a coarse table. In JPEG: each DCT coefficient is divided by an integer in a _quantization table_, then rounded to the nearest whole number. Small coefficients (typically the high-frequency ones) round straight to zero; large coefficients survive with reduced precision. _This is where the loss happens_ — the original coefficient cannot be recovered from the rounded value. JPEG quality settings adjust the quantization table: lower quality → larger divisors → more zeros → smaller files but more visible blocks and ringing. Quantization is the bridge from real-valued DCT output to a stream of integers that entropy coding (Huffman/run-length) can pack tightly.