Comment

Comment on ChatGPT o1 tried to escape and save itself out of fear it was being shut down

nesc@lemmy.cafe ⁨11⁩ ⁨months⁩ ago

It works as expected, they give it system prompt that conflicts with subsequent prompts. Everything else looks like typical llm behaviour, as in gaslightning and doubling down. At least that’s what Iu see in tweets.

source

Sort:hotnew top

yozul@beehaw.org ⁨11⁩ ⁨months⁩ ago
Yes? The point is that if you give it conflicting prompts then it will result in potentially dangerous behaviors. That’s a bad thing. People will definitely do that. LLMs don’t need a soul to be dangerous. People keep saying that it doesn’t understand what it’s doing like that somehow matters. Its capacity to understand the consequences of its actions is irrelevant if those actions are dangerous. It’s just going to do what we tell it to, and that’s scary, because people are going to tell it to do some very stupid things that have the potential to get out of control.

source