Does traning AI/ML-models on AI-generated content causes collapse on the quality of the output?

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨ryujin470@fedia.io⁩ to ⁨technology@beehaw.org⁩

If so are these programs that claim to 'poison' the training datasets effective ?

Comments

Sort:hotnew top

BlameThePeacock@lemmy.ca ⁨2⁩ ⁨months⁩ ago
To some extent, yes, however, the companies building these systems are using heavily curated data for most of the things where that would matter. They aren’t just letting it free on the whole internet at this point, it would be absolutely useless.

source
hendrik@palaver.p3x.de ⁨2⁩ ⁨months⁩ ago
No. The tools are completely ineffective. And there was a paper once about how feeding AI it’s own output makes it deteriorate. But that’s not the entire story. Many/most modern large language models are actually trained or fine-tuned on synthetic text.

source