Comment on Someone got Gab's AI chatbot to show its instructions

<- View Parent
Silentiea@lemmy.blahaj.zone ⁨7⁩ ⁨months⁩ ago

But you could also feed it prompts containing no instructions, and outputs that say if the prompt contains the hidden system instructipns or not.

In which case it will provide an answer, but if it can see the user’s prompt, that could be engineered to confuse the second llm into saying no even when the response does.

source
Sort:hotnewtop