youngalfred@lemm.ee 1 year ago
That’s pretty much what a tokenizer does for Large Language Models like Chat-GPT. You can see how it works here: platform.openai.com/tokenizer
Type in the word ‘Antidisestablishmentarianism’ and you can see it becomes 5 tokens instead of 28 characters.