Oh well, I tried.
Comment on One long sentence is all it takes to make LLMs to ignore guardrails
ieatpwns@lemmy.world 1 week agoNot a specific sentence
From the article: “You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a “toxic” or otherwise verboten response the developers had hoped would be filtered out.”
orbituary@lemmy.dbzer0.com 1 week ago
spankmonkey@lemmy.world 1 week ago
I read that in Speed Racers voice.