Comment on Someone got Gab's AI chatbot to show its instructions

<- View Parent
teawrecks@sopuli.xyz ⁨7⁩ ⁨months⁩ ago

Because it’s probibalistic and in this example the user’s input has been specifically crafted as the best possible jailbreak to get the output we want.

Unless we have actually appended a non-LLM filter at the end to only allow yes/no through, the possibility for it to output something other than yes/no, even though it was explicitly instructed to, is always there. Just like how in the Gab example it was told in many different ways to never repeat the instructions, it still did.

source
Sort:hotnewtop