Comment

Comment on LEAKED: A New List Reveals Top Websites Meta Is Scraping of Copyrighted Content to Train Its AI

mindbleach@sh.itjust.works ⁨2⁩ ⁨days⁩ ago

Outright piracy? It’s not allowed, but it’s supposed to be a civil matter.

Videos posted without permission? I don’t think the audience is liable for that.

Scraping despite robots.txt? If that’s illegal for its own sake, then it’s overreaching on ‘unauthorized access.’

Training on any of this? … nah, it’s probably fine.

A pile of linear algebra that knows what pornography looks like does not serve the same function as any particular example. No more than one video infringes on another for the general idea of cameras pointed at naked people. Producing the same kind of thing is not infringement. (Though if it involves Shrek, the trademark people will have angry and confusing questions.)

Reproducing any particular input is a failure of training. Even the Bible should be paraphrased past about Genesis 1:9. The whole idea is getting the vibe of everything we’ve ever published. Cliff notes, passable imitation of the writing style, couple passages everyone’s quoted verbatim.

An encyclopedia article about a book doesn’t become illegal if we learn the author shoplifted it.

source

Sort:hotnew top