Comment

Comment on One long sentence is all it takes to make LLMs to ignore guardrails

Not a specific sentence

From the article: “You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a “toxic” or otherwise verboten response the developers had hoped would be filtered out.”

source

Sort:hotnew top

spankmonkey@lemmy.world ⁨5⁩ ⁨weeks⁩ ago
I read that in Speed Racers voice.

source
orbituary@lemmy.dbzer0.com ⁨5⁩ ⁨weeks⁩ ago
Oh well, I tried.

Image

source