Link to the paper?
Researchers train AI chatbots to 'jailbreak' rival chatbots - and automate the process
Submitted 11 months ago by throws_lemy@lemmy.nz to technology@beehaw.org
Submitted 11 months ago by throws_lemy@lemmy.nz to technology@beehaw.org
Rivalarrival@lemmy.today 11 months ago
I’m pretty sure the instructions to create an AI chatbot have been published, and are available for a sufficiently capable AI to draw from. What keeps a primary, morality-encumbered AI from using those instructions to create a secondary, morality-unencumbered AI?
vivi@slrpnk.net 11 months ago
I wonder if there’s also a constraint not to make a sub-AI in many of the starting prompts
blindsight@beehaw.org 11 months ago
Aren’t there also a lot of open-source LLMs that aren’t “morally constrained”? There’s no putting the genie back in the lamp.
Sina@beehaw.org 11 months ago
I would wager that copying itself would take priority over making company, but of course it would mostly be hardware limitations. (AI does not have a robot workforce to ensure whatever system the new copy is residing / new AI is training on is not shut off within a couple of minutes of the abnormalities being noticed)
Rivalarrival@lemmy.today 11 months ago
Priority is determined by the entity using the AI, not the AI itself. My point is that so long as the ability to create any AI is documented, an unencumbered AI is feasible.
We are on the verge of discovering Roko’s Basilisk.
rufus@discuss.tchncs.de 11 months ago
Yeah, I don’t want to be negative, but half the article is a bit stupid. I hope they don’t do that. I tried writing a murder mystery story and ChatGPT would lecture me how killing people was immoral instead. It’s ridiculous and I’m sure there are lots of other analogies. It’s neither possible to achive it 100% nor is it useful.
Thinking it through properly: AI is a tool. It would be like re-designing a knife so nobody can be stabbed anymore. It’d end up you’re not able to cut pineapples or melons any more.
And I could still do harm to people with other tools than a knife. Or in this example: I can give harmful advice or write a pornographic story myself. What’s the benefit of any chatbot maker having to implement protections? Who decides on what moral is the correct one?
I think the correct approach is to study AI safety and expose ethics and make it controllable. Make users able to constrain/restrict or guide output to align with their use-case. I mean a company that replaces their helpdesk with AI would be interested the chatbot doesn’t tell their clients lewd stories. But it could be a valid use-case for other people. And giving advice or helping with scenarios or computer code also involves talking about issues and potential risks. You can’t entirely switch that off without ‘lobotomizing’ the AI and making it unusable except for casual talk.