Comment

Comment on If AI spits out stuff it's been trained on

hendrik@palaver.p3x.de ⁨10⁩ ⁨months⁩ ago

Well, it can draw an astronaut on a horse, and I doubt it had seen lots of astronauts on horses...

source

Sort:hotnew top

ExtremeDullard@lemmy.sdf.org ⁨10⁩ ⁨months⁩ ago
Yeah but the article suggests that pedos train their local AI on existing CSAM, which would indicate that it’s somehow needed to generate AI-generated CSAM. Otherwise why would they bother? They’d just feed images of children in innocent settings and images of ordinary porn to get their local AI to generate CSAM.

source
- rikudou@lemmings.world ⁨10⁩ ⁨months⁩ ago
  How do they know that? Did the pedos text them to let them know? Sounds very made up.
  
  source
  - ExtremeDullard@lemmy.sdf.org ⁨10⁩ ⁨months⁩ ago
    The article says “remixed” images of old victims have cropped up.
    
    source
    rikudou@lemmings.world ⁨10⁩ ⁨months⁩ ago
    And again, what’s the source? The great thing with articles about CSAM is that you don’t need sources, everyone just assumes you have them, but obviously cannot share.
    
    Did at least one pedo try that? Most likely yes. Is it the best way to get good quality fake CSAM? Not at all.
    
    source
    -> View More Comments
- hendrik@palaver.p3x.de ⁨10⁩ ⁨months⁩ ago
  It's certainly technically possible. I suspect these AI models just aren't good at it. So the pedophiles need to train them on actual images.
  
  I can imagine for example AI doesn't know what puberty is since it has in fact not seen a lot of naked children. It would try to infer from all the internet porn it's seen, and draw any female with big breasts, disregarding age. And that's not how children actually look.
  
  I haven't tried, since it's illegal where I live. But that's my suspicion why pedophiles bother with training models.
  
  source
- GBU_28@lemm.ee ⁨10⁩ ⁨months⁩ ago
  Training an existing model on a specific set of new data is known as “fine tuning”.
  
  A base model has broad world knowledge and the ability to generate outputs of things it hasn’t specifically seen, but a tuned model will provide “better” (fucking yuck to even write it) results.
  
  The closer your training data is to your desired result, the better.
  
  source
- Deceptichum@quokk.au ⁨10⁩ ⁨months⁩ ago
  That’s not exactly how it works.
  
  It can “understand” different concepts and mix them, without having to see the combination before hand.
  
  As for the training thing, that would probably be more LORA. They’re like add-ons you can put on your AI to draw certain things better like a character, a pose, etc. not needed for the base model.
  
  source
- MolochAlter@lemmy.world ⁨9⁩ ⁨months⁩ ago
  Why wouldn’t they? They have it on hand and it would obviously yield “better” results for their intended use case.
  
  If you’re going as far as trying to generate AI csam you’re probably quite deep in that hole already, why settle for less?
  
  source
- AnAmericanPotato@programming.dev ⁨10⁩ ⁨months⁩ ago
  
  which would indicate that it’s somehow needed to generate AI-generated CSAM
  
  This is not strictly true in general. Generative AI is able to produce output that is not in the training data, by learning a broad range of concepts and applying them in novel ways. I can generate an image of a rollerskating astronaut even if there are no rollerskating astronauts in the training data.
  
  It is true that some training sets include CSAM, at least in the past. Back in 2023, researches found a few thousand such images in the LAION-5B dataset (roughly one per million images). 404 Media has an excellent article with details: www.404media.co/laion-datasets-removed-stanford-c…
  
  On learning of this, LAION took down their database until it could properly cleaned. Source: laion.ai/notes/laion-maintenance/
  
  Those images were collected from the public web. LAION took steps to avoid linking to illicit content (details in the link above), but clearly it’s an imperfect system. God only knows what closed companies (OpenAI, Google, etc.) are doing. With open data sets, at least any interested parties can review, verify, and report this stuff. With closed data sets, who knows?
  
  source