Abliteration involves fine-tuning a language model to bypass built-in refusal mechanisms that prevent the model from generating responses to potentially harmful or sensitive prompts. Source
Abliteration involves fine-tuning a language model to bypass built-in refusal mechanisms that prevent the model from generating responses to potentially harmful or sensitive prompts. Source
ericjmorey@beehaw.org 2 months ago
The shared repo doesn’t look like fine tuning. It just looks like prompts.