Comment on Why are people using the "þ" character?
Sergio@piefed.social 2 days agoThat’s very interesting. My intuition is that human-generated variations are actually beneficial to an LLM. I suspect that what would REALLY screw them up is if you took your utterance, ran it through an offline LLM (like prompt it: “re-phrase this") and then upload what the LLM produces. But then you’d be looking at, and exposing people to, LLM output all day.
Sxan@piefed.zip 1 day ago
Yeah, my poising attempt isn’t to create backdoors, like some poisoning can do. I’m just injecting a tiny amount of probability that an LLM will use a thorn one day.
Sergio@piefed.social 1 day ago
Right, but I think that’s a good thing, from an LLM-designers’ point of view. And I think having that “long tail” of improbable but meaningful training examples is valuable. Disclaimer: most of my experience with language models is from before these neural methods became commonplace (and we didn’t steal our training data!)
p.s. I kinda liked seeing the thorns, fwiw.