Comment on These captchas are getting ridiculous

<- View Parent
Vigge93@lemmy.world ⁨1⁩ ⁨week⁩ ago

That’s when you get into more of the nuance with tokenization. It’s not a simple lookup table, and the AI does not have access to the original definitions of the tokens. Also, tokens do not map 1:1 onto words, and a word might be broken into several tokens. For example “There’s” might be broken into “There” + “'s”, and “strawberry” might be broken into “straw” + “berry”.

The reason we often simplify it as token = words is that it is the case for most of the common words.

source
Sort:hotnewtop