Comment

UnderpantsWeevil@lemmy.world ⁨1⁩ ⁨year⁩ ago

LLM wasn’t made for this

There’s a thought experiment that challenges the concept of cognition, called The Chinese Room. What it essentially postulates is a conversation between two people, one of whom is speaking Chinese and getting responses in Chinese. And the first speaker wonders “Does my conversation partner really understand what I’m saying or am I just getting elaborate stock answers from a big library of pre-defined replies?”

The LLM is literally a Chinese Room. And one way we can know this is through these interactions. The machine isn’t analyzing the fundamental meaning of what I’m saying, it is simply mapping the words I’ve input onto a big catalog of responses and giving me a standard output. In this case, the problem the machine is running into is a legacy meme about people miscounting the number of "r"s in the word Strawberry. So “2” is the stock response it knows via the meme reference, even though a much simpler and dumber machine that was designed to handle this basic input question could have come up with the answer faster and more accurately.

When you hear people complain about how the LLM “wasn’t made for this”, what they’re really complaining about is their own shitty methodology. They build a glorified card catalog. A device that can only take inputs, feed them through a massive library of responses, and sift out the highest probability answer without actually knowing what the inputs or outputs signify cognitively.

Even if you want to argue that having a natural language search engine is useful (damn, wish we had a tool that did exactly this back in August of 1996, amirite?), the implementation of the current iteration of these tools is dogshit because the developers did a dogshit job of sanitizing and rationalizing their library of data.

Imagine asking a librarian “What was happening in Los Angeles in the Summer of 1989?” and that person fetching you back a stack of history textbooks, a stack of Sci-Fi screenplays, a stack of regional newspapers, and a stack of Iron-Man comic books all given equal weight? Imagine hearing the plot of the Terminator and Escape from LA intercut with local elections and the Loma Prieta earthquake.

That’s modern LLMs in a nutshell.

source

Sort:hotnew top

jsomae@lemmy.ml ⁨1⁩ ⁨year⁩ ago
You’ve missed something about the Chinese Room. The solution to the Chinese Room riddle is that it is not the person in the room but rather the room itself that is communicating with you. The fact that there’s a person there is irrelevant, and they could be replaced with a speaker or computer terminal.

Put differently, it’s not an indictment of LLMs that they are merely Chinese Rooms, but rather one should be impressed that the Chinese Room is so capable despite being a completely deterministic machine.

If one day we discover that the human brain works on much simpler principles than we once thought, would that make humans any less valuable? It should be deeply troubling to us that LLMs can do so much while the mathematics behind them are so simple. Arguments that because LLMs are just scaled-up autocomplete they surely can’t be very good at anything are not comforting to me at all.

source
- kassiopaea@lemmy.blahaj.zone ⁨1⁩ ⁨year⁩ ago
  This. I often see people shitting on AI as “fancy autocomplete” or joking about how they get basic things incorrect like this post but completely discount how incredibly fucking capable they are in every domain that actually matters. That’s what we should be worried about… what does it matter that it doesn’t “work the same” if it still accomplishes the vast majority of the same things? The fact that we can get something that even approximates logic and reasoning ability from a deterministic system is terrifying on implications alone.
  
  source
  - Knock_Knock_Lemmy_In@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Why doesn’t the LLM know to write (and run) a program to calculate the number of characters?
    
    I feel like I’m missing something fundamental.
    
    source
    OsrsNeedsF2P@lemmy.ml ⁨1⁩ ⁨year⁩ ago
    You didn’t get good answers so I’ll explain.
    
    First, an LLM can easily write a program to calculate the number of rs. If you ask an LLM to do this, you will get the code back.
    
    But the website ChatGPT.com has no way of executing this code, even if it was generated.
    
    The second explanation is how LLMs work. They work on the word (technically token, but think word) level. They don’t see letters. The AI behind it literally can only see words. The way it generates output is it starts typing words, and then guesses what word is most likely to come next. So it literally does not know how many rs are in strawberry. The impressive part is how good this “guessing what word comes next” is at answering more complex questions.
    
    source
    -> View More Comments
    outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
    It doesn’t know things.
    
    It’s a statistical model. It cannot synthesize information ir problem solve, only show you a rough average of its library if inputs graphed by proximity to your input.
    
    source
    -> View More Comments
    jsomae@lemmy.ml ⁨1⁩ ⁨year⁩ ago
    The LLM isn’t aware of its own limitations in this regard. The specific problem of getting an LLM to know what characters a token comprises has not been the focus of training. It’s a totally different kind of error than other hallucinations, it’s almost entirely orthogonal, but other hallucinations are much more important to solve, whereas being able to count the number of letters in a word or add numbers together is not very important, since as you point out, there are already programs that can do that.
    
    source
    -> View More Comments
- UnderpantsWeevil@lemmy.world ⁨1⁩ ⁨year⁩ ago
  
  one should be impressed that the Chinese Room is so capable despite being a completely deterministic machine.
  
  I’d be more impressed if the room could tell me how many "r"s are in Strawberry inside five minutes.
  
  If one day we discover that the human brain works on much simpler principles
  
  Human biology, famous for being simple and straightforward.
  
  source
  - outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
    Ah! But you can skip all that messy biology abd stuff i don’t understand that’s probably not important, abd just think of it as a classical computer running an x86 architecture, and checkmate, liberal my argument owns you now!
    
    source
  - jsomae@lemmy.ml ⁨1⁩ ⁨year⁩ ago
    Because LLMs operate at the token level, I think it would be a more fair comparison with humans to ask why humans can’t produce the IPA spelling words they can say, /nɔr kæn ðeɪ ˈizəli rid θɪŋz ˈrɪtən ˈpjʊrli ɪn aɪ pi ˈeɪ/ despite the fact that it should be simple to – they understand the sounds after all. I’d be impressed if somebody could do this too! But that most people can’t shouldn’t really move you to think humans must be fundamentally stupid because of this one curious artifact.
    
    source
    UnderpantsWeevil@lemmy.world ⁨1⁩ ⁨year⁩ ago
    
    why humans can’t produce the IPA spelling words they can say, /nɔr kæn ðeɪ ˈizəli rid θɪŋz ˈrɪtən ˈpjʊrli ɪn aɪ pi ˈeɪ/ despite the fact that it should be simple to – they understand the sounds after all
    
    That’s just access to the right keyboard interface. Humans can and do produce those spellings with additional effort or advanced tool sets.
    
    humans must be fundamentally stupid because of this one curious artifact.
    
    Humans turns oatmeal into essays via a curios lump of muscle is an impressive enough trick on its face.
    
    LLMs have 95% of the work of human intelligence handled for them and still stumble on the last bits.
    
    source
    -> View More Comments
- outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
  Its not a fucking riddle, it’s a koan/thought experiment.
  
  It’s questioning what ‘communication’ fundamentally is, and what knowledge fundamentally is.
  
  It’s not even the first thing to do this. Military theory was cracking away at the ‘communication’ thing a century before, and the nature of knowledge has discourse going back thousands of years.
  
  source
  - jsomae@lemmy.ml ⁨1⁩ ⁨year⁩ ago
    You’re right, I shouldn’t have called it a riddle. Still, being a fucking thought experiment doesn’t preclude having a solution. Theseus’ ship is another famous fucking thought experiment, which has also been solved.
    
    source
    outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
    ‘A solution’
    
    That’s not even remotely the point. Yes there are nany valid solutions. The point isn’t to solve it, but what how you solve it says about and clarifies your ideas.
    
    source
    -> View More Comments
shalafi@lemmy.world ⁨1⁩ ⁨year⁩ ago
You might just love Blind Sight. Here, they’re trying to decide if an alien life form is sentient or a Chinese Room:

“Tell me more about your cousins,” Rorschach sent.

“Our cousins lie about the family tree,” Sascha replied, “with nieces and nephews and Neandertals. We do not like annoying cousins.”

“We’d like to know about this tree.”

Sascha muted the channel and gave us a look that said Could it be any more obvious? “It couldn’t have parsed that. There were three linguistic ambiguities in there. It just ignored them.”

“Well, it asked for clarification,” Bates pointed out.

“It asked a follow-up question. Different thing entirely.”

Bates was still out of the loop. Szpindel was starting to get it, though… .

source
- CitizenKong@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Blindsight is such a great novel. It has not one, not two but three great sci-fi concepts rolled into one.
  
  One is artificial intelligence (the ship’s captain is an AI), the second is alien life so vastly different it appears incomprehensible to human minds. And last but not least, and the most wild, vampires as a evolutionary branch of humanity that died out and has been recreated in the future.
  
  source
  - outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
    Also, the extremely post-cyberpunk posthumans, and each member of the crew is a different extremely capable kind of fucked up model of what we might become, with the protagonist personifying the genre of horror that it is, whike still being occasionally hilarious.
    
    source
    CitizenKong@lemmy.world ⁨1⁩ ⁨year⁩ ago
    Oooh, I didn’t even know it had a sequel!
    
    I wouldn’t say it flirts with the supernatural as much as it’s with one foot into weird fiction, which is where cosmic horror comes from.
    
    source
    -> View More Comments
  - TommySalami@lemmy.world ⁨1⁩ ⁨year⁩ ago
    My a favorite part of the vampire thing is how they died out. Turns out vampires start seizing when trying to visually process 90° angles, and humans love building shit like that (not to mention a cross is littered with them). It’s so mundane an extinction I’d almost believe it.
    
    source
RedstoneValley@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
That’s a very long answer to my snarky little comment :) I appreciate it though. Personally, I find LLMs interesting and I’ve spent quite a while playing with them. But after all they are like you described, an interconnected catalogue of random stuff, with some hallucinations to fill the gaps. They are NOT a reliable source of information or general knowledge or even safe to use as an “assistant”. The marketing of LLMs as being fit for such purposes is the problem. Humans tend to turn off their brains and to blindly trust technology, and the tech companies are encouraging them to do so by making false promises.

source
frostysauce@lemmy.world ⁨1⁩ ⁨year⁩ ago

(damn, wish we had a tool that did exactly this back in August of 1996, amirite?)

Wait, what was going on in August of '96?

source
- UnderpantsWeevil@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Google Search premiered
  
  source
outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
Yes but have you considered that it agreed with me so now i need to defend it to the death against you horrible apes, no matter the allegation or terrain?

source
Knock_Knock_Lemmy_In@lemmy.world ⁨1⁩ ⁨year⁩ ago

a much simpler and dumber machine that was designed to handle this basic input question could have come up with the answer faster and more accurately

The human approach could be to write a (python) program to count the number of characters precisely.

When people refer to agents, is this what they are supposed to be doing? Is it done in a generic fashion or will it fall over with complexity?

source
- outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
  No, this isn’t what ‘agents’ do, ‘agents’ just interact with other programs. So kike move your mouse around to buy stuff, using the same methods as everything else.
  
  source
  - Knock_Knock_Lemmy_In@lemmy.world ⁨1⁩ ⁨year⁩ ago
    ‘agents’ just interact with other programs.
    
    If that other program is, say, a python terminal then can’t LLMs be trained to use agents to solve problems outside their area of expertise?
    
    I just tested chatgpt to write a python program to return the frequency of letters in a string, then asked it for the number of L’s in the longest placename in Europe.
    
    ‘’''
    
    String to analyze
    
    text = "Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch"
    
    Convert to lowercase to count both ‘L’ and ‘l’ as the same
    
    text = text.lower()
    
    Dictionary to store character frequencies
    
    frequency = {}
    
    Count characters
    
    for char in text: if char in frequency: frequency[char] += 1 else: frequency[char] = 1
    
    Show the number of ‘l’s
    
    print(“Number of 'l’s:”, frequency.get(‘l’, 0))
    
    ‘’’
    
    I was impressed until
    
    Output
    
    Number of 'l’s: 16
    
    source
    outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
    Yeah it turns out to be useless!
    
    source
- UnderpantsWeevil@lemmy.world ⁨1⁩ ⁨year⁩ ago
  
  When people refer to agents, is this what they are supposed to be doing?
  
  That’s not how LLMs operate, no. They aggregate raw text and sift for popular answers to common queries.
  
  ChatGPT is one step removed from posting your question to Quora.
  
  source
  - Knock_Knock_Lemmy_In@lemmy.world ⁨1⁩ ⁨year⁩ ago
    But an LLM as a node in a framework that can call a python library should be able to count the number of Rs in strawberry.
    
    It doesn’t scale to AGI but it does reduce hallucinations.
    
    source
    UnderpantsWeevil@lemmy.world ⁨1⁩ ⁨year⁩ ago
    
    But an LLM as a node in a framework that can call a python library
    
    Isn’t how these systems are configured. They’re just not that sophisticated.
    
    So much of what Sam Alton is doing is brute force, which is why he thinks he needs a $1T investment in new power to build his next iteration model.
    
    Deepseek gets at the edges of this through their partitioned model. But you’re still asking a lot for a machine to intuit whether a query can be solved with some exigent python query the system has yet to identify.
    
    It doesn’t scale to AGI but it does reduce hallucinations
    
    It has to scale to AGI, because a central premise of AGI is a system that can improve itself.
    
    It just doesn’t match the OpenAI development model, which is to just scrape and sort data hoping the Internet already has the solution to every problem.
    
    source
    -> View More Comments
    outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
    You’d still be better off starting with a 50s language processor, then grafting on some API calls.
    
    source
    -> View More Comments
merc@sh.itjust.works ⁨1⁩ ⁨year⁩ ago

Imagine asking a librarian “What was happening in Los Angeles in the Summer of 1989?” and that person fetching you … That’s modern LLMs in a nutshell.

I agree, but I think you’re still being too generous to LLMs. A librarian who fetched all those things would at least understand the question. An LLM is just trying to generate words that might logically follow the words you used.

IMO, one of the key ideas with the Chinese Room is that there’s an assumption that the computer / book in the Chinese Room experiment has infinite capacity in some way. So, no matter what symbols are passed to it, it can come up with an appropriate response. But, obviously, while LLMs are incredibly huge, they can never be infinite. As a result, they can often be “fooled” when they’re given input that semantically similar to a meme, joke or logic puzzle. The vast majority of the training data that matches the input is the meme, or joke, or logic puzzle. LLMs can’t reason so they can’t distinguish between “this is just a rephrasing of that meme” and “this is similar to that meme but distinct in an important way”.

source
- jsomae@lemmy.ml ⁨1⁩ ⁨year⁩ ago
  Can you explain the difference between understanding the question and generating the words that might logically follow? I’m aware that it’s essentially a more powerful version of how auto-correct works, but why should we assume that shows some lack of understanding at a deep level somehow?
  
  source
  - merc@sh.itjust.works ⁨1⁩ ⁨year⁩ ago
    
    Can you explain the difference between understanding the question and generating the words that might logically follow?
    
    I mean, it’s pretty obvious. Take someone like Rowan Atkinson whose death has been misreported multiple times. If you ask a computer system “Is Rowan Atkinson Dead?” you want it to understand the question and give you a yes/no response based on actual facts in its database. A well designed program would know to prioritize recent reports as being more authoritative than older ones. It would know which sources to trust, and which not to trust.
    
    An LLM will just generate text that is statistically likely to follow the question. Because there have been many hoaxes about his death, it might use that as a basis and generate a response indicating he’s dead. But, because those hoaxes have also been debunked many times, it might use that as a basis instead and generate a response indicating that he’s alive.
    
    So, if he really did just die and it was reported in reliable fact-checked news sources, the LLM might say “No, Rowan Atkinson is alive, his death was reported via a viral video, but that video was a hoax.”
    
    but why should we assume that shows some lack of understanding
    
    Because we know what “understanding” is, and that it isn’t simply finding words that are likely to appear following the chain of words up to that point.
    
    source
    KeenFlame@feddit.nu ⁨1⁩ ⁨year⁩ ago
    Just if you were a hater that would be cool with me. I don’t like “ai” either. The explanations you give are misleading at best. It’s embarrassing. You fail to realise the fact that NOBODY KNOWS why or how they work. It’s just extreme folly to pretend you know these things. It’s been observed to reason novel ideas which is why it is confusing for scientists that work with them why it happens. It’s not just data lookup. You think entire Web and history of man fits in 8 gb? You are just educating people with just your basic rage filled opinion, not actual answers. You are angry at the discovery, we get that. You don’t believe in it. Ok. But don’t say you know what it does and how, or what openai does behind its closed doors. It’s just embarrassing. We are working on papers to try to explain the emergent phenomenon we discovered in neural nets that make it seem like it can reason and output mostly correct answers to difficult questions. It’s not in the “data” and it looks for it. You could just start learning if you want to be an educator in the field.
    
    source
    jsomae@lemmy.ml ⁨1⁩ ⁨year⁩ ago
    The Rowan Atkinson thing isn’t misunderstanding, it’s understanding but having been misled. I’ve literally done this exact thing myself, say something was a hoax (because in the past it was) but then it turned out there was newer info I didn’t know about. I’m not convinced LLMs as they exist today don’t prioritize sources – if trained naively, sure, but these days they can, for instance, integrate search results, and can update on new information. If the LLM can answer correctly only after checking a web search, and I can do the same only after checking a web search, that’s a score of 1-1.
    
    because we know what “understanding” is
    
    Really? Who claims to know what understanding is? Do you think it’s possible there can ever be an AI (even if different from an LLM) which is capable of “understanding?” How can you tell?
    
    source
    -> View More Comments
  - outhouseperilous@lemmy.dbzer0.com ⁨1⁩ ⁨year⁩ ago
    So, what is ‘understanding’?
    
    If you need help, you can look at marx for a pretty good answer.
    
    source
    jsomae@lemmy.ml ⁨1⁩ ⁨year⁩ ago
    oh does he have a treatise on the subject?
    
    source
    -> View More Comments
Leet@lemmy.zip ⁨1⁩ ⁨year⁩ ago
Can we say for certain that human brains aren’t sophisticated Chinese rooms…

source
- UnderpantsWeevil@lemmy.world ⁨1⁩ ⁨year⁩ ago
  Yes.
  
  source

String to analyze

Convert to lowercase to count both ‘L’ and ‘l’ as the same

Dictionary to store character frequencies

Count characters

Show the number of ‘l’s

Output