Comment on Could you compress text files by mapping a word to how commonly it is used and translating it with an application?

youngalfred@lemm.ee ⁨1⁩ ⁨year⁩ ago

That’s pretty much what a tokenizer does for Large Language Models like Chat-GPT. You can see how it works here: platform.openai.com/tokenizer

Type in the word ‘Antidisestablishmentarianism’ and you can see it becomes 5 tokens instead of 28 characters.

source
Sort:hotnewtop