Comment on Someone got Gab's AI chatbot to show its instructions

<- View Parent
teawrecks@sopuli.xyz ⁨6⁩ ⁨months⁩ ago

Oh, I misread your original comment. I thought you meant looking at the user’s input and trying to determine if it was a jailbreak.

Then I think the way around it would be to ask the LLM to encode it some way that the 2nd LLM wouldn’t pick up on. Maybe it could rot13 encode it, or you provide a key to XOR with everything. Or since they’re usually bad at math, maybe something like pig latin, or that thing where you shuffle the interior letters of each word, but keep the first/last the same? Would have to try it out, but I think you could find a way. Eventually, if the AI is smart enough, it probably just reduces to Diffie-Hellman lol. But then maybe the AI is smart enough to not be fooled by a jailbreak.

source
Sort:hotnewtop