Comment

Comment on Valve dev counters calls to scrap Steam AI disclosures, says it's a "technology relying on cultural laundering, IP infringement, and slopification"

<- View Parent

Devial@discuss.online ⁨3⁩ ⁨months⁩ ago

If the model collapse theory weren’t true, then why do LLMs need to scrape so much data from the internet for training ?

According to you, they should be able to just generate synthetic training data purely with the previous model, and then use that to train the next generation.

So why is there even a need for human input at all them ? Why are all LLM companies fighting tooth and nail against their data scraping being restricted, if real human data is in fact so unnecessary for model training.

You can stop models from deteriorating without new data, and you can even train them with synthetic data, but that still requires the synthetic data to either be modelled, or filtered by humans to ensure its quality. If you just take a million random chatGPT outputs, with no human filtering whatsoever, and use that to restrain the chatGPT, and then repeat that over and over again, eventually the model will turn to shit. Each iteration some of the random tweaks chatGPT makes to their output are going to produce a bad output, which is now presented to the new training model as a target to achieve, so the model learns this bad output is less bad than it previously thought.

source

Sort:hotnew top

CatsPajamas@lemmy.dbzer0.com ⁨3⁩ ⁨months⁩ ago
I stopped reading when you said according to me and then produced a wall of text of shit I never said.

Synthetic data is massively helpful. You can look it up. This is a myth.

source
- Devial@discuss.online ⁨3⁩ ⁨months⁩ ago
  That is enormously ironic, since I literally never claimed you said anything except for what you did: Namely, that synthetic data is enough to train models.
  
  Everything else in my comment is quite explicitly my own thoughts on the matter, and why I disagree with that statment, so in actual fact, you’re the one making up shit I never said.
  
  source