← workshop ← workbench ↗ where one bit costs heat

The Shannon Limit

entropy · Huffman codes · the source-coding theorem
self-test …
H — entropy (the floor)
L — Huffman avg length
gap = L − H
efficiency H ⁄ L
H
the Huffman tree
Merge the two least-likely symbols, over and over. The path to each leaf is its codeword — go left = 0, right = 1. No codeword is a prefix of another, so the stream decodes with no commas. This tree is provably the shortest such code (checked below against a brute-force search of every prefix code).
the code  ·  ideal bits −log₂p vs Huffman length
symp−log₂pcodelen
encode → decode a sample, byte-for-byte
the source
Englishtypical letters
Skewedone symbol dominates
DyadicH = L exactly
Uniformmax entropy
A source emits symbols with probabilities pᵢ. Its entropy H = −Σ pᵢ·log₂ pᵢ is the average information per symbol — the irreducible cost in bits. Huffman builds the best prefix code; its length L obeys H ≤ L < H+1. You can never beat H.

block coding — approach the limit
k = 1

the cipher punchline
A substitution cipher just relabels the symbols — it never touches the probabilities, so the entropy is identical. That is exactly why frequency analysis breaks it: the information was never hidden, only renamed.

The same code that draws this page is tested headless in Node (core.test.mjs) — Huffman vs a brute-force optimum over 400 random sources, Kraft = 1, round-trips, the block-coding bound. Click the self-test pill to run the in-page witnesses live.