Comment

TheRealKuni@piefed.social ⁨3⁩ ⁨days⁩ ago

LLMs are really bad with letters, and in my limited understanding that’s because they don’t see words as strings of letters, they see them as tokens. It’s all numbers by the time the LLM is processing it.

source

Sort:hotnew top

exasperation@lemmy.dbzer0.com ⁨2⁩ ⁨days⁩ ago
We think in terms of tokens, too, but we have the ability to look under the hood at some of how our knowledge is constructed.

For the typical literate English speaker, we seamlessly pronounce certain letter combinations as different from the component parts (like ch, sh, ph, or looking ahead to see if the syllable ends in an E to decide how to pronounce the vowel in the middle). Then, entire words or phrases have a single meaning that doesn’t get broken apart. Similarly, people who are fluent in multiple languages, including languages that use the same script (e.g., latin letters), can look at the whole string of text to quickly figure out which language they’re reading, and consult that part of their knowledge base.

And usually our brains process things completely separately from how we read or write text. Even the question of asking how many r’s are in “raspberry” requires us to go and count, because it isn’t inherent in the knowledge we have at the tip of tongue. Someone can memorize a speech but not know how many times the word “the” appears in it, even if their knowledge contains all the information necessary to answer the question.

Even if we are actively thinking in the context of how words are constructed, like doing crosswords, these things tend to be more fun when mixed with other modes of thinking: Wordle’s mix of both logic and spelling, a classic crossword’s clever style of hints, etc.

Manipulation of letters is simply one mode of thinking. We’re really good at seamlessly switching between modes.

source