Comment on Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs
ericjmorey@beehaw.org 4 weeks ago
I’m not sure what abliteration is
Abliteration involves fine-tuning a language model to bypass built-in refusal mechanisms that prevent the model from generating responses to potentially harmful or sensitive prompts. Source
The shared repo doesn’t look like fine tuning. It just looks like prompts.
thelucky8@beehaw.org 4 weeks ago
ericjmorey@beehaw.org 4 weeks ago
The shared repo doesn’t look like fine tuning. It just looks like prompts.