Comment

Comment on lemm.ee plans for mitigating image upload abuse

For step 6 - are you aware of the tooling the admin at dbzero has built to automate the scanning of images in Lemmy instances? It looks pretty promising.

source

Sort:hotnew top

sunaurus@lemm.ee ⁨2⁩ ⁨years⁩ ago
Yep, I’ve already tested it and it’s one of the options I am considering implementing for lemm.ee as well.

source
- PriorProject@lemmy.world ⁨2⁩ ⁨years⁩ ago
  It’s worth calling considering some commercially developed options as well: prostasia.org/…/csam-filtering-options-compared/
  
  The Cloudflare tool in particular is freely and widely available: blog.cloudflare.com/the-csam-scanning-tool/
  
  I am no expert, but I’m quite skeptical of db0’s tool:
  
  It repurposes a library designed for preventing synthetic CSAM using stable diffusion. This library is typically used in conjunction with prompt scanning and other inputs into the generation process. When run outside it’s normal context on non-ai images, it will lack all this input context which I speculate reduces its effectiveness relative to the conditions under which it’s tested and developed.
  
  AI techniques live and die by the quality of the dataset used to train them. There is not and cannot be an open-source test dataset of CSAM upon which to train such a tool. One can attempt workarounds like extracting features classified and extracted separately like trying to detect coexisting features related to youth (trained from dataset A using non sexualized images including children) and sexuality (trained separately from dataset B using images containing only adult performers)… but the efficacy of open source solutions is going to be hamstrung by train, test, and assess effectiveness. Developers of major commercial CSAM scanners are better able to partner with NCMEC and other groups fighting CSAM to assess the effectiveness of their tools.
  
  I’m no expert, but my belief is that open tools are likely to be hamstrung permanently compared to the tools developed by big companies and the most effective solutions for Lemmy must integrate big company tools (or gov/nonprofit tools if they exist).
  
  PS: Really impressed by your response plan. I hope the Lemmy world admins are watching this post, I know you all communicate and collaborate. Disabling image uploads is I think I very effective temporary response until detection and response tooling can be improved.
  
  source
  - barsoap@lemm.ee ⁨2⁩ ⁨years⁩ ago
    The neat thing is that it’s all much easier as lemm.ee doesn’t allow porn: The filter can just nuke nudity with extreme prejudice, adult or not.
    
    source
  - iquanyin@lemmy.world ⁨2⁩ ⁨years⁩ ago
    you make some good points. this gave rise to a thought: seems like law enforcement would have such a data set and seems they should of course allow tools to be trained on it. seems but who knows? might be worth finding out.)
    
    source
    barsoap@lemm.ee ⁨2⁩ ⁨years⁩ ago
    If you have publicly available detection tools you can train models based on how well stuff they generate triggers those models, i.e. train an AI to generate CSAM. There’s no way to isolate knowledge and understanding so none of it is public and if you see public APIs they’re behind appropriate rate-limiting etc. so that you can’t use them for that purpose.
    
    source
    PriorProject@lemmy.world ⁨2⁩ ⁨years⁩ ago
    I’m not sure I follow the suggestion.
    
    NCMEC, the US-based organization tasked with fighting CSAM, has already partnered with a list of groups to develop CSAM detection tools. I’ve already linked to an overview of the resulting toolsets in my original comment.
    
    The datasets used to develop these tools are private, but that’s not an oversight. The datasets are… well… full of CSAM. Distributing them openly and without restriction would be contrary to NCMEC’s mission and to US law, so they limit the downside by partnering only with serious/capable partners who are able to commit to investing significant resources to developing and long-term maintaining detection tools, and who can sign onerous legal paperwork promising to handle appropriately the access they must be given to otherwise illegal material to do so.
    
    CSAM detection tools are necessarily a cat and mouse game of CSAM producers attempting to evade detection vs detection experts trying to improve detection. In such a race, secrecy is a useful… if costly… tool. But as a result, NCMEC requires a certain amount of secrecy from their partners about how the detection tools work and who can run them in what circumstances. The goal of this secrecy is to prevent CSAM producers from developing test suites that allow them to repeatedly test image manipulation strategies that retain visual fidelity but thwart detection techniques.
    
    All of which is to say…
    
    … seems like law enforcement would have such a data set and seems they should of course allow tools to be trained on it. seems but who knows? might be worth finding out.)
    
    Law enforcement DOES have datasets, and DO allow tools to be trained on them… I’ve linked the resulting tools. They do NOT allow randos direct access to the data or tools, which is a necessary precaution to prevent attackers from winning the circumvention race. A Red Hat or Mozilla scale organization might be able to partner with NCMEC or another organization to become a detection tooling partner, but db0, sunaurus, or the Lemmy devs likely cannot without the support of a large technology org with a proven track record or delivering and maintaining successful/impactful technology products. This has the big downside of making a true open-source detection tool more or less impossible… but that’s a well-understood tradeoff that CSAM-fighting orgs are not likely to change as the same access that would empower OSS devs would empower CSAM producers. I’m not sure there’s anything more to find out in this regard.
    
    source
    -> View More Comments
    Cubes@lemm.ee ⁨2⁩ ⁨years⁩ ago
    Tbh I’m kind of surprised no government has set up a service themselves to deal with situations like this since law enforcement is always dealing with CSAM, and it seems like it’d make their job easier.
    
    Plus with the flurry of hugely privacy-invading or anti-encryption legislation that shows up every few months under the guise of “protecting the children online”, it seems like that should be a top priority for them, right?! Right…?
    
    source
    -> View More Comments
Franzia@lemmy.blahaj.zone ⁨2⁩ ⁨years⁩ ago
It seems promising but also incomplete for US hosts, as our laws do not allow deletion of CSAM rather it must be saved and preserved and sent to a central authority and not deleted until they give the okay. Rofl.

I also wonder if this solution will use PHash or other hashing to filter out known and unaltered CSAM images (without actually comparing the images, rather their metadata).

source
- iquanyin@lemmy.world ⁨2⁩ ⁨years⁩ ago
  i didn’t quite know that and yet, it doesn’t surprise either.
  
  source
ApeNo1@lemm.ee ⁨2⁩ ⁨years⁩ ago
I blocked botart from their instance as some pretty disturbing stuff was added in the last few days.

source