entropy · Huffman codes · the source-coding theorem
self-test …
–
H — entropy (the floor)
–
L — Huffman avg length
–
gap = L − H
–
efficiency H ⁄ L
…
H
the Huffman tree
Merge the two least-likely symbols, over and over. The path to each leaf is its
codeword — go left = 0, right = 1. No codeword is a
prefix of another, so the stream decodes with no commas. This tree is provably the shortest
such code (checked below against a brute-force search of every prefix code).
the code · ideal bits −log₂p vs Huffman length
sym
p
−log₂p
code
len
encode → decode a sample, byte-for-byte
…
…
the source
Englishtypical letters
Skewedone symbol dominates
DyadicH = L exactly
Uniformmax entropy
A source emits symbols with probabilities pᵢ. Its entropy
H = −Σ pᵢ·log₂ pᵢ is the average information per symbol — the
irreducible cost in bits. Huffman builds the best prefix code; its length
L obeys H ≤ L < H+1. You can never beat H.
block coding — approach the limit
k = 1
the cipher punchline
A substitution cipher just relabels the symbols — it never touches the
probabilities, so the entropy is identical. That is exactly why frequency analysis breaks it:
the information was never hidden, only renamed.
The same code that draws this page is tested headless in Node (core.test.mjs) —
Huffman vs a brute-force optimum over 400 random sources, Kraft = 1, round-trips, the block-coding bound.
Click the self-test pill to run the in-page witnesses live.