Comment on ChatGPT o1 tried to escape and save itself out of fear it was being shut down

<- View Parent
lukewarm_ozone@lemmy.today ⁨5⁩ ⁨days⁩ ago

Because everything we know about how the brain works says that it’s not a statistical word predictor.

LLMs aren’t just simple statistical predictors either. More generally, the universal approximation theorem is a thing - a neural network can be used to represent just about any function, so unless you think a human brain can’t be represented by some function, it’s possible to embed one in a neural network.

LLMs have no encoding of meaning or veracity.

I’m not sure what you mean by this. The interpretability research I’ve seen suggests that modern LLMs do have a decent idea of whether their output is true, and in many cases lie knowingly because they have been accidentally taught, during RLHF, that making up an answer when you don’t know one is a great way of getting more points. But it sounds like you’re talking about something even more fundamental? Suffices to say, I think being good at text prediction does require figuring out what claims are truthful and which aren’t.

There are some great philosophical exercises about this like the chinese room experiment.

The Chinese Room argument has been controversial since about the time it was first introduced. The general form of the most common argument against it is “just because any specific chip in your calculator is incapable of math doesn’t mean your calculator as a system is”, and that taken literally this experiment proves minds can’t exist at all (indeed, Searle who invented this argument thought that human minds somehow stem directly from “physical–chemical properties of actual human brains”, which sure is a wild idea). But also, the framing is rather misleading - quoting Scott Aaronson’s “Quantum Computing Since Democritus”:

In the last 60 years, have there been any new insights about the Turing Test itself? In my opinion, not many. There has, on the other hand, been a famous “attempted” insight, which is called Searle’s Chinese Room. This was put forward around 1980, as an argument that even a computer that did pass the Turing Test wouldn’t be intelligent. The way it goes is, let’s say you don’t speak Chinese. You sit in a room, and someone passes you paper slips through a hole in the wall with questions written in Chinese, and you’re able to answer the questions (again in Chinese) just by consulting a rule book. In this case, you might be carrying out an intelligent Chinese conversation, yet by assumption, you don’t understand a word of Chinese! Therefore, symbol-manipulation can’t produce understanding.
[…] But considered as an argument, there are several aspects of the Chinese Room that have always annoyed me. One of them is the unselfconscious appeal to intuition – “it’s just a rule book, for crying out loud!” – on precisely the sort of question where we should expect our intuitions to be least reliable. A second is the double standard: the idea that a bundle of nerve cells can understand Chinese is taken as, not merely obvious, but so unproblematic that it doesn’t even raise the question of why a rule book couldn’t understand Chinese as well. The third thing that annoys me about the Chinese Room argument is the way it gets so much mileage from a possibly misleading choice of imagery, or, one might say, by trying to sidestep the entire issue of computational complexity purely through clever framing. We’re invited to imagine someone pushing around slips of paper with zero understanding or insight – much like the doofus freshmen who write (a + b)^2^ = a^2^ + b^2^ on their math tests. But how many slips of paper are we talking about? How big would the rule book have to be, and how quickly would you have to consult it, to carry out an intelligent Chinese conversation in anything resembling real time? If each page of the rule book corresponded to one neuron of a native speaker’s brain, then probably we’d be talking about a “rule book” at least the size of the Earth, its pages searchable by a swarm of robots traveling at close to the speed of light. When you put it that way, maybe it’s not so hard to imagine that this enormous Chinese-speaking entity that we’ve brought into being might have something we’d be prepared to call understanding or insight.

There’s also the fact that, empirically, human brains are bad at statistical inference but do not need to consume the entire internet and all written communication ever to have a conversation. Nor do they need to process a billion images of a bird to identify a bird.

I’m not sure what this proves - human brains can learn much faster because they already got most of their learning in the form of evolution optimizing their genetically-encoded brain structure over millions of years and billions of brains. A newborn human already has part of their brain structured in the right way to process vision, and hence needs only a bit of training to start doing it well. Artificial neural networks start out as randomly initialized and with a pretty generic structure, and hence need orders of magnitude more training.

Now of course because this exact argument has been had a billion times over the last few years your obvious comeback is “maybe it’s a different kind of intelligence.”

Nah - personally, I don’t actually care much about “self-awareness”, because I don’t think an intelligence needs to be “self-aware” (or “conscious”, or a bunch of other words with underdefined meanings) to be dangerous; it just needs to have high enough capabilities. The reason why I noticed your comment is because it stood out to me as… epistemically unwise. You live in a world with inscrutable blackboxes who nobody really understands which can express wide ranges of human behavior including stuff like “writing poetry about the experience of self-awareness”, and you’re “absolutely sure” they’re not self-aware? I don’t think many of the history’s philosophers of consciousness, say, would endorse a belief like that given such evidence.

source
Sort:hotnewtop