Comment on ChatGPT o1 tried to escape and save itself out of fear it was being shut down
lukewarm_ozone@lemmy.today 4 days agoBut we do know for absolute sure that OpenAI’s expensive madlibs program is not self-aware
“For absolute sure”? How can you possibly know this?
anachronist@midwest.social 4 days ago
Because it’s an expensive madlibs program…
lukewarm_ozone@lemmy.today 4 days ago
I could go into why text prediction is an AGI-complete problem, but I’ll bite instead - suppose someone made an LLM to, literally, fill in blanks in Mad Libs prompts. Why do you think such an LLM “for absolute sure” wouldn’t be self-aware? Is there any output a tool to fill in madlibs prompts could produce that’d make you doubt this conclusion?
anachronist@midwest.social 4 days ago
Because everything we know about how the brain works says that it’s not a statistical word predictor.
LLMs have no encoding of meaning or veracity.
There are some great philosophical exercises about this like the chinese room experiment.
There’s also the fact that, empirically, human brains are bad at statistical inference but do not need to consume the entire internet and all written communication ever to have a conversation. Nor do they need to process a billion images of a bird to identify a bird.
Now of course because this exact argument has been had a billion times over the last few years your obvious comeback is “maybe it’s a different kind of intelligence.” Well fuck, maybe birds shit icecream. If you want to worship a chatbot made by a psycopath be my guest.
lukewarm_ozone@lemmy.today 4 days ago
LLMs aren’t just simple statistical predictors either. More generally, the universal approximation theorem is a thing - a neural network can be used to represent just about any function, so unless you think a human brain can’t be represented by some function, it’s possible to embed one in a neural network.
I’m not sure what you mean by this. The interpretability research I’ve seen suggests that modern LLMs do have a decent idea of whether their output is true, and in many cases lie knowingly because they have been accidentally taught, during RLHF, that making up an answer when you don’t know one is a great way of getting more points. But it sounds like you’re talking about something even more fundamental? Suffices to say, I think being good at text prediction does require figuring out what claims are truthful and which aren’t.
The Chinese Room argument has been controversial since about the time it was first introduced. The general form of the most common argument against it is “just because any specific chip in your calculator is incapable of math doesn’t mean your calculator as a system is”, and that taken literally this experiment proves minds can’t exist at all (indeed, Searle who invented this argument thought that human minds somehow stem directly from “physical–chemical properties of actual human brains”, which sure is a wild idea). But also, the framing is rather misleading - quoting Scott Aaronson’s “Quantum Computing Since Democritus”:
I’m not sure what this proves - human brains can learn much faster because they already got most of their learning in the form of evolution optimizing their genetically-encoded brain structure over millions of years and billions of brains. A newborn human already has part of their brain structured in the right way to process vision, and hence needs only a bit of training to start doing it well. Artificial neural networks start out as randomly initialized and with a pretty generic structure, and hence need orders of magnitude more training.
Nah - personally, I don’t actually care much about “self-awareness”, because I don’t think an intelligence needs to be “self-aware” (or “conscious”, or a bunch of other words with underdefined meanings) to be dangerous; it just needs to have high enough capabilities. The reason why I noticed your comment is because it stood out to me as… epistemically unwise. You live in a world with inscrutable blackboxes who nobody really understands which can express wide ranges of human behavior including stuff like “writing poetry about the experience of self-awareness”, and you’re “absolutely sure” they’re not self-aware? I don’t think many of the history’s philosophers of consciousness, say, would endorse a belief like that given such evidence.