it’s completely infeasible
Then it shouldn’t be done. That’s the unethical part. Trying to just avoid the problem by continuing to scrape large data sets for images that you shouldn’t be using is the entire problem. Either get permission for each image or don’t build your image model. Doing otherwise is unethical.
elliot_crane@lemmy.world 9 months ago
Again, in many instances, folks training models are using repositories of images that have been publicly shared. In many cases the person/people who assembled the image repositories are not the same person using them. I agree that reckless scraping is not responsible, but if you’re using a repository of images that’s presented as ok to use for AI training, I’d argue it’s even more ethical to strip out the Nightshaded images, because clearly the presence of Nigthshade means you shouldn’t use that one. I guess we’re just going to have to agree to disagree here, because I see this as a helpful tool to specifically avoid training on images you shouldn’t be.