Comment on One long sentence is all it takes to make LLMs to ignore guardrails

<- View Parent
ieatpwns@lemmy.world ⁨1⁩ ⁨week⁩ ago

Not a specific sentence

From the article: “You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a “toxic” or otherwise verboten response the developers had hoped would be filtered out.”

source
Sort:hotnewtop