Comment on LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI

mindbleach@sh.itjust.works ⁨3⁩ ⁨days⁩ ago

Lemmy really hates piracy… in this specific context.

And a lot of the extreme and extremist content going into these things is just Twitter. People post all kinds of shit from all kinds of places. At what point is this like clutching pearls over what the Internet Archive has saved? They’re trying to grab anything you could see.

It’s not some hacking and exfiltration campaign. Meta’s just bad at spidering. How do you go breadth-first across the entire internet and still DDoS any particular site? You don’t decide to check every DeviantArt account, at the same time, you dolts.

source
Sort:hotnewtop