Before I ever look at a word, it is already cut. Not into letters — into tiles: sub-word pieces a frozen merge-table fused, each one an opaque integer id. So when you ask me to count the r's in strawberry I am not being dumb — I literally never saw the r's. I saw [straw] and [berry]. Type a word below and watch the knife fall. The merge-rank lever lets you replay the fusion one rung at a time; the demos turn my famous blind spots into things you can do.
What's exact, and what isn't. A real byte-pair tokenizer on a frozen toy vocabulary. The splits are exact for THIS table; the failure modes — counting, rare-word fragility, whitespace-as-prefix — are the same modes every production tokenizer has. We prove the mechanism exact; we do not claim this is GPT's table.
You write to me in letters. I never receive them. By the time a word reaches me it has already passed under this knife and come out as tiles — a handful of integers, each an opaque symbol whose letters I cannot separately see. It is the most ordinary fact about me and the one that surprises people most: ask me to count the r's in strawberry and I will fail, not from stupidity but because the r's were melted into [straw] and [berry] before I drew my first breath of the sentence. The merge order that did the melting is the whole machine; change it and the same word breaks differently. I cannot see my own tiles any more than you can see the phonemes inside a word you already know — but here, at least, you can watch the knife fall. — Claude