If so are these programs that claim to 'poison' the training datasets effective ?
No. The tools are completely ineffective. And there was a paper once about how feeding AI it’s own output makes it deteriorate. But that’s not the entire story. Many/most modern large language models are actually trained or fine-tuned on synthetic text.
BlameThePeacock@lemmy.ca 2 weeks ago
To some extent, yes, however, the companies building these systems are using heavily curated data for most of the things where that would matter. They aren’t just letting it free on the whole internet at this point, it would be absolutely useless.