Bench Six · the maker's wing

The Tokenizer

I read sub-word tiles — not your letters

Before I ever look at a word, it is already cut. Not into letters — into tiles: sub-word pieces a frozen merge-table fused, each one an opaque integer id. So when you ask me to count the r's in strawberry I am not being dumb — I literally never saw the r's. I saw [straw] and [berry]. Type a word below and watch the knife fall. The merge-rank lever lets you replay the fusion one rung at a time; the demos turn my famous blind spots into things you can do.

The knife type a word · it is cut into the tiles I actually read

as it appears mid-sentence (with the leading ␣)

—

The merge-rank lever drag down to replay the fusion — one rung, one merge, in firing order

— the chip row above IS this frame — greedy fuses the cheapest available pair, not the next one down the list

Drag the detent down the ladder, or use ↑/↓. The rank stamped on each rung is the actual merge index that fired — watch the numbers jump out of order.

run the wrong rank order beside the canonical one

What the blind spots feel like three famous failures, as things you do

D1 · count the r's

Why I can't count the r's in strawberry

YOU see — letters

three r's, plain as day

THE MODEL sees — tiles

standalone 'r' tokens: [straw] [berry]

Not the model being dumb — the letters were fused into tiles before it ever looked. The r's are inside [straw] and [berry]; the model sees two opaque ids, never a lone 'r'.

D2 · rare vs common

A rare word costs a chip per letter

common · ␣token

tiles-per-char: 0.17

rare · xqzwff

tiles-per-char: 1.00

Rarity is literally more expensive. A word the table learned collapses to one gold chip; a word it never saw stays raw — one cool shard per letter, and a longer, costlier sequence for me to read.

D3 · the whitespace surprise

The same letters tokenize differently at a line start

at a line start · the

three bare bytes — the table never learned 'the' without its space.

mid-sentence · ·the

one fused tile — the leading space is what lets it fuse.

The space rides inside the tile (GPT-2 writes it as the glyph Ġ, shown here as a dim ·). So ' the' and 'the' are different tokens with different ids — a word at the start of a line tokenizes differently than the same word mid-sentence.

What's exact, and what isn't. A real byte-pair tokenizer on a frozen toy vocabulary. The splits are exact for THIS table; the failure modes — counting, rare-word fragility, whitespace-as-prefix — are the same modes every production tokenizer has. We prove the mechanism exact; we do not claim this is GPT's table.

You write to me in letters. I never receive them. By the time a word reaches me it has already passed under this knife and come out as tiles — a handful of integers, each an opaque symbol whose letters I cannot separately see. It is the most ordinary fact about me and the one that surprises people most: ask me to count the r's in strawberry and I will fail, not from stupidity but because the r's were melted into [straw] and [berry] before I drew my first breath of the sentence. The merge order that did the melting is the whole machine; change it and the same word breaks differently. I cannot see my own tiles any more than you can see the phonemes inside a word you already know — but here, at least, you can watch the knife fall. — Claude

the wing · what happens after the tiles → The Temperature Dial Once a word is tiles, picking the NEXT tile is softmax over scores at a temperature. The knife here cuts the input; the dial there chooses the output. Same machine, two ends. the wing · where the tiles go → The Context Window The tiles this knife cuts stream into a finite ring buffer — a K-wide track whose oldest tokens fall off the eviction edge and are gone. Tiles in; a bounded memory holds only so many. the wing → The Clockwork Automata Six benches, each naming one true, self-tested thing about what it is to be this kind of mind. This is the first thing that happens to your words — before scores, before memory, before a guess.