Comment on Technologist: 'Fining Big Tech isn't working, make them give away illegally trained LLMs as public domain'

<- View Parent
p03locke@lemmy.dbzer0.com ⁨4⁩ ⁨weeks⁩ ago

I guess the idea is that the models themselves are not infringing copyright, but the training process DID.

I’m still not understanding the logic. Here is a copyrighted picture. I can search for it, download it, view it, see it with my own eye balls. My browser already downloaded the image for me, in order for me to see it in the browser. I can take that image and edit it in a photo editor. I can do whatever I want with the image on my own computer, as long as I don’t publish the image elsewhere on the internet. All of that is legal. None of it infringes on copyright.

Hell, it could be argued that if I transform the image to a significant degree, I can still publish it under Fair Use. But, that still gets into a gray area for each use case.

What is not a gray area is what AI training does. They download the image and use it in training, which is like me looking at a picture in a browser. The image isn’t republished, or stored in the published model, or represented in any way that could be reconstructed back to the source image in any reasonable form. It just changes a bunch of weights in a LLM model. It’s mathematically impossible for a 4GB model to somehow store the many many terabytes of images on the internet.

Where is the copyright infringement?

source
Sort:hotnewtop