It seems promising but also incomplete for US hosts, as our laws do not allow deletion of CSAM rather it must be saved and preserved and sent to a central authority and not deleted until they give the okay. Rofl.
I also wonder if this solution will use PHash or other hashing to filter out known and unaltered CSAM images (without actually comparing the images, rather their metadata).
sunaurus@lemm.ee 1 year ago
Yep, I’ve already tested it and it’s one of the options I am considering implementing for lemm.ee as well.
PriorProject@lemmy.world 1 year ago
It’s worth calling considering some commercially developed options as well: prostasia.org/…/csam-filtering-options-compared/
The Cloudflare tool in particular is freely and widely available: blog.cloudflare.com/the-csam-scanning-tool/
I am no expert, but I’m quite skeptical of db0’s tool:
I’m no expert, but my belief is that open tools are likely to be hamstrung permanently compared to the tools developed by big companies and the most effective solutions for Lemmy must integrate big company tools (or gov/nonprofit tools if they exist).
PS: Really impressed by your response plan. I hope the Lemmy world admins are watching this post, I know you all communicate and collaborate. Disabling image uploads is I think I very effective temporary response until detection and response tooling can be improved.
barsoap@lemm.ee 1 year ago
The neat thing is that it’s all much easier as lemm.ee doesn’t allow porn: The filter can just nuke nudity with extreme prejudice, adult or not.
iquanyin@lemmy.world 1 year ago
you make some good points. this gave rise to a thought: seems like law enforcement would have such a data set and seems they should of course allow tools to be trained on it. seems but who knows? might be worth finding out.)
barsoap@lemm.ee 1 year ago
If you have publicly available detection tools you can train models based on how well stuff they generate triggers those models, i.e. train an AI to generate CSAM. There’s no way to isolate knowledge and understanding so none of it is public and if you see public APIs they’re behind appropriate rate-limiting etc. so that you can’t use them for that purpose.
PriorProject@lemmy.world 1 year ago
I’m not sure I follow the suggestion.
All of which is to say…
Law enforcement DOES have datasets, and DO allow tools to be trained on them… I’ve linked the resulting tools. They do NOT allow randos direct access to the data or tools, which is a necessary precaution to prevent attackers from winning the circumvention race. A Red Hat or Mozilla scale organization might be able to partner with NCMEC or another organization to become a detection tooling partner, but db0, sunaurus, or the Lemmy devs likely cannot without the support of a large technology org with a proven track record or delivering and maintaining successful/impactful technology products. This has the big downside of making a true open-source detection tool more or less impossible… but that’s a well-understood tradeoff that CSAM-fighting orgs are not likely to change as the same access that would empower OSS devs would empower CSAM producers. I’m not sure there’s anything more to find out in this regard.
Cubes@lemm.ee 1 year ago
Tbh I’m kind of surprised no government has set up a service themselves to deal with situations like this since law enforcement is always dealing with CSAM, and it seems like it’d make their job easier.
Plus with the flurry of hugely privacy-invading or anti-encryption legislation that shows up every few months under the guise of “protecting the children online”, it seems like that should be a top priority for them, right?! Right…?