Comment on Google AI chatbot responds with a threatening message: "Human … Please die."
Bougie_Birdie@lemmy.blahaj.zone 4 weeks agoWith the sheer volume of training data required, I have a hard time believing that the data sanitation is high quality.
If I had to guess, it’s largely filtered through scripts, and not thoroughly vetted by humans. So data sanitation might look for the removal of slurs and profanity, but wouldn’t have a way to find misinformation or a request that the reader stops existing.
Swedneck@discuss.tchncs.de 4 weeks ago
anything containing “die” ought to warrant a human skimming it over at least
Bougie_Birdie@lemmy.blahaj.zone 4 weeks ago
I don’t disagree, but it is a challenging problem. If you’re filtering for “die” then you’re going to find diet, indie, diesel, remedied, and just a whole mess of other words.
I’m in the camp where I believe they really should be reading all their inputs. You’ll never know what you’re feeding the machine otherwise.
However I have no illusions that they’re not cutting corners to save money
Swedneck@discuss.tchncs.de 4 weeks ago
huh? finding only the literal word “die” is a trivial regex, it’s something vim users do all the time when editing text files lol
Bougie_Birdie@lemmy.blahaj.zone 4 weeks ago
Sure, but underestimating the scope is how you wind up with a Scunthorpe problem