Comment on Someone got Gab's AI chatbot to show its instructions

<- View Parent
teawrecks@sopuli.xyz ⁨6⁩ ⁨months⁩ ago

Oh I see, you’re saying the training set is exclusively with yes/no answers. That’s called a classifier, not an LLM. But yeah, you might be able to make a reasonable “does this input and this output create a jailbreak for this set of instructions” classifier.

source
Sort:hotnewtop