Comment on Refusal in LLMs is mediated by a single direction

<- View Parent
toxuin@lemmy.ca ⁨1⁩ ⁨year⁩ ago

It works in reverse too. You can make any LLM “forget” that it is even able to refuse anything.

source
Sort:hotnewtop