Comment on Your Bluesky Posts Are Probably In A Bunch of AI Datasets Now [404 Media]
jarfil@beehaw.org 2 weeks agoFrom a copyright point of view… the rights to each piece of content are of each owner… but each owner is sharing that content with an instance, with the intent of it getting re-shared to further instances.
In a strict sense, most instances are in breach of copyright law: they don’t require users to agree to an EULA specifying how the content will be used, they don’t require federated instances to agree to the same terms, they don’t make end users agree to the terms of other instances, and generally allow users to submit someone else’s content (see: memes) without the owner’s authorization, then share and re-share it across the federated network. A fully “copyright compliant” protocol, would need to have these things baked into it from the beginning… which would make joining the federated network a royal PITA.
With the current approach of “like, chill bro”… anyone can set up an instance, federate with whatever target or federated-of-a-target one, and save all the data without any consequences. The fact of receiving federated data, carries an implicit consent to process that data, and definitely does nothing to prevent random processing.
Scraping the web endpoint of an instance, carries the rules set by the EULA of that endpoint… which tend to be none, or in the best case, are that of the least restrictive instance offering that federated data.
All of that, before scrapers simply ignoring any requirements.