Comment on Someone got Gab's AI chatbot to show its instructions

<- View Parent
teawrecks@sopuli.xyz ⁨8⁩ ⁨months⁩ ago

I think if the 2nd LLM has ever seen the actual prompt, then no, you could just jailbreak the 2nd LLM too. But you may be able to create a bot that is really good at spotting jailbreak-type prompts in general, and then prevent it from going through to the primary one. I also assume I’m not the first to come up with this and OpenAI knows exactly how well this fares.

source
Sort:hotnewtop