Comment on Technologist: 'Fining Big Tech isn't working, make them give away illegally trained LLMs as public domain'

<- View Parent
GenderNeutralBro@lemmy.sdf.org ⁨5⁩ ⁨weeks⁩ ago

I guess the idea is that the models themselves are not infringing copyright, but the training process DID. Some of the big players have admitted to using pirated material in training data. The rest obviously did even if they haven’t admitted it.

While language models have the capacity to produce infringing output, I don’t think the models themselves are infringing (though there are probably exceptions). I mean, gzip can reproduce infringing material too with the correct input. If producing infringing work requires both the algorithm AND specific, intentional user input, then I don’t think you should put the blame solely on the algorithm.

Either way, I don’t think existing legal frameworks are suitable to answer these questions, so I think it’s more important to think about what the law should be rather than what it currently is.

I remember stories about the RIAA suing individuals for many thousands of dollars per mp3 they downloaded. If you applied that logic to OpenAI — maximum fine for every individual work used — it’d instantly bankrupt them. Honestly, I’d love to see it. But I don’t think any copyright holder has the balls to try that against someone who can afford lawyers. They’re just bullies.

source
Sort:hotnewtop