Why do some images compress and others don’t?
Two photos, same dimensions, same number of pixels — same data in the trivial sense. Save them as PNG: one is 18 KB, the other is 312 KB. What makes the second one resist compression? The answer is one quantity from the entropy module, applied twice — once to the histogram, once to the structure the histogram throws away.
A file’s size is bounded by its information, not its pixel count. A color histogram is a
All three patterns share this histogram — each level appears 16 times.
Same data, different information
A 16×16 grayscale image is 256 pixels — 256 bytes if you store one byte per pixel. That’s the data. But how many bits of information does the image actually contain? That’s a different question, and the answer can be much less than 8 bits/pixel. If every pixel is the same shade, you need almost no bits to describe the picture: “all of them, value 7.” If every pixel is independent random noise, you need close to 8 bits each, because there’s no pattern to exploit. Real photos sit between, and the gap between data and information is exactly what a compressor harvests.
The histogram bound
The first place to look is the pixel value distribution — the histogram (we’ll wrap it shortly). Count how many pixels are at each level, divide by total to get probabilities, plug into . The number you get is the average
For our 16-level patch where each level appears exactly 16 times, every level has probability , so bits/pixel. That’s the histogram bound — the Shannon limit for symbol-by-symbol encoding. Halving 8 bits/pixel to 4 already explains why a 16-shade image compresses to half the size of a 256-shade one.
import numpy as np
from collections import Counter
from math import log2
# A 16x16 patch with each of 16 grayscale levels appearing exactly 16 times.
# We'll build three layouts that share this histogram and compare entropies.
def gradient():
img = np.zeros((16, 16), dtype=np.int32)
for i in range(16):
img[i, :] = i # row i is constant level i
return img
def blocks():
img = np.zeros((16, 16), dtype=np.int32)
for i in range(16):
for j in range(16):
img[i, j] = (i // 4) * 4 + (j // 4)
return img
def scrambled(seed=0):
rng = np.random.default_rng(seed)
flat = gradient().flatten()
rng.shuffle(flat)
return flat.reshape(16, 16)
# Histogram entropy — bits/pixel if you encoded each pixel independently.
def histogram_entropy(img):
flat = img.flatten()
counts = Counter(flat)
N = len(flat)
return sum(-(c / N) * log2(c / N) for c in counts.values() if c > 0)
[histogram_entropy(p()) for p in (gradient, blocks, scrambled)]
# → [4.0, 4.0, 4.0] identical — same histogram, same H.What the histogram doesn't see
Toggle between gradient, blocks, and scrambled in the widget. The pixels are exactly the same multiset — same 16 copies of each level. Histogram entropy: bits/pixel for all three. Identical.
But the pictures don’t look the same, and they don’t compress the same. The gradient is smooth: every adjacent pair is either equal or differs by 1. The scrambled patch has neighbors that fight each other — adjacent levels are essentially independent. Same histogram, very different spatial structure, and that structure is where most of the real compression happens.
# Neighbor-difference entropy — what a real compressor sees.
# For each pair of adjacent pixels (right + down), count abs(diff). The
# resulting distribution carries the spatial structure the histogram missed.
def neighbor_diff_entropy(img):
diffs = []
H, W = img.shape
for i in range(H):
for j in range(W):
if j + 1 < W: diffs.append(abs(int(img[i, j]) - int(img[i, j+1])))
if i + 1 < H: diffs.append(abs(int(img[i, j]) - int(img[i+1, j])))
counts = Counter(diffs)
N = len(diffs)
return sum(-(c / N) * log2(c / N) for c in counts.values() if c > 0)
[neighbor_diff_entropy(p()) for p in (gradient, blocks, scrambled)]
# → [0.34, 1.34, 4.07] gradient ~0, scrambled ~log2(16) ≈ 4.
# Histograms saw three identical sources; neighbor differences see three
# very different ones. That gap is the entire point of spatial compression.Neighbor differences — entropy of a different alphabet
The trick: instead of encoding each pixel as a value 0..15, encode each pixel as the difference from its left or top neighbor. The first pixel still costs full bits, but every subsequent one is usually a small number — often zero. Now run on the distribution of differences. For the gradient: most differences are 0 or 1, entropy is well under 1 bit/pixel. For scrambled: differences are uniform across the full range, entropy is back near 4. The widget shows both numbers side by side; the ratio is the speedup a structure-aware compressor delivers over a histogram-only one.
This is the page in one sentence:
Lossless vs lossy — two ways out of the histogram bound
Histogram entropy is a hard floor for
Two — give up exactness.
# Real-world check: encode all three with PNG and compare file sizes.
# PNG uses filter prediction (essentially neighbor differences) followed by
# DEFLATE (LZ77 + Huffman). Same pixel multiset, very different output.
import io
from PIL import Image
def png_size(img):
buf = io.BytesIO()
Image.fromarray((img * 16).astype('uint8'), mode='L').save(buf, 'PNG')
return len(buf.getvalue())
sizes = {p.__name__: png_size(p()) for p in (gradient, blocks, scrambled)}
# → {'gradient': ~120, 'blocks': ~140, 'scrambled': ~310}
#
# The histograms predicted 4 bits × 256 pixels = 128 bits = 16 bytes of
# *symbol payload*; PNG's framing overhead is fixed. The interesting number
# is the ratio of payloads: gradient ≈ 1×, scrambled ≈ 2-3× larger. Same
# histogram, different file size — the signature of spatial compression.Histogram entropy is a lower bound for symbol-independent encoding, not an upper bound on what a real codec achieves. PNG of the gradient patch is much smaller than 4 bits/pixel × 256 pixels = 128 bytes — because PNG also uses LZ77, which finds the literal repeated runs that neighbor-difference entropy doesn’t quite capture. The histogram says 4 bits, neighbor diff says ~0.4, PNG-with-LZ77 effectively says ~0.1. Each step down corresponds to a richer model of what the picture actually is. The same image is more or less compressible depending on the question you ask about it.
A file’s size is bounded by its information, not its pixel count. Histograms see what values appear; neighbor differences see structure; transform codecs see frequency content. Each view exposes a smaller information budget than the last — and that’s exactly the budget the encoder has to spend.
A 4-color image has 100 pixels: 50 red, 25 green, 15 blue, 10 yellow. Compute the histogram entropy in bits per pixel (natural log: ; for this one just use the rough values , , , ).
You have two 100×100 images. Both contain the exact same 10,000 pixel values: 5,000 black and 5,000 white. Image A is a clean checkerboard pattern; image B is the same pixels randomly scattered. Their histogram entropies are identical ( bit/pixel, since p(black) = p(white) = 0.5). Which compresses smaller as a PNG, and why? Predict the rough ratio.
In the widget, switch between gradient and scrambled. Both report bits/pixel. So a naive engineer concludes: “Both images need 256 × 4 = 1024 bits = 128 bytes of payload, plus PNG overhead.” Run the actual PNG sizes (the code in arc 4 measures it). Why does the scrambled file end up roughly twice as large — despite the histogram saying they should be the same?
A JPEG of a photograph is often smaller than the histogram bound of the same photograph would allow for a lossless encoding. How is that possible? In one sentence, distinguish what entropy bounds and what it doesn’t.
Information theory courses define entropy on i.i.d. sources and stop there — pixels, audio samples, and language tokens get treated as independent draws. Real signals never are. The gap between symbol entropy and contextual entropy (what real codecs exploit) is the whole reason a 4 MB photograph fits in 400 KB without visible loss; it’s also why “compression” is a ladder, not a single number. Lemma keeps the histogram and the neighbor-difference views in the same widget so the gap is visible from the start: same histogram, same H, three different pictures, three different file sizes. The entropy module names the bound; this page shows what the bound isn’t bounding.