Comment

Comment on Technologist: 'Fining Big Tech isn't working, make them give away illegally trained LLMs as public domain'

That’s stupid. The damage is still done to the owner of that data used illegally. Make them destroy it.

But when you levy such miniscule fines that are less than they stand to make from it, it’s just a cost of business. Fines can work if they were appropriate to the value derived.

source

Sort:hotnew top

GenderNeutralBro@lemmy.sdf.org ⁨9⁩ ⁨months⁩ ago
I guess the idea is that the models themselves are not infringing copyright, but the training process DID. Some of the big players have admitted to using pirated material in training data. The rest obviously did even if they haven’t admitted it.

While language models have the capacity to produce infringing output, I don’t think the models themselves are infringing (though there are probably exceptions). I mean, gzip can reproduce infringing material too with the correct input. If producing infringing work requires both the algorithm AND specific, intentional user input, then I don’t think you should put the blame solely on the algorithm.

Either way, I don’t think existing legal frameworks are suitable to answer these questions, so I think it’s more important to think about what the law should be rather than what it currently is.

I remember stories about the RIAA suing individuals for many thousands of dollars per mp3 they downloaded. If you applied that logic to OpenAI — maximum fine for every individual work used — it’d instantly bankrupt them. Honestly, I’d love to see it. But I don’t think any copyright holder has the balls to try that against someone who can afford lawyers. They’re just bullies.

source
- p03locke@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago
  
  I guess the idea is that the models themselves are not infringing copyright, but the training process DID.
  
  I’m still not understanding the logic. Here is a copyrighted picture. I can search for it, download it, view it, see it with my own eye balls. My browser already downloaded the image for me, in order for me to see it in the browser. I can take that image and edit it in a photo editor. I can do whatever I want with the image on my own computer, as long as I don’t publish the image elsewhere on the internet. All of that is legal. None of it infringes on copyright.
  
  Hell, it could be argued that if I transform the image to a significant degree, I can still publish it under Fair Use. But, that still gets into a gray area for each use case.
  
  What is not a gray area is what AI training does. They download the image and use it in training, which is like me looking at a picture in a browser. The image isn’t republished, or stored in the published model, or represented in any way that could be reconstructed back to the source image in any reasonable form. It just changes a bunch of weights in a LLM model. It’s mathematically impossible for a 4GB model to somehow store the many many terabytes of images on the internet.
  
  Where is the copyright infringement?
  
  source
  - Semjaza@lemmynsfw.com ⁨9⁩ ⁨months⁩ ago
    If you take that image, copy it and then try to resell it for profit you’ll find you’re quickly in breach of copyright.
    
    The LLM is, in most cases, being licensed out to users for a profit off of the input data without which it could not exist in its current form.
    
    You could see it akin to plagiarism if you think ctrl+c, ctrl+v is too extreme.
    
    source
    p03locke@lemmy.dbzer0.com ⁨9⁩ ⁨months⁩ ago
    
    If you take that image, copy it and then try to resell it for profit you’ll find you’re quickly in breach of copyright.
    
    That’s not what’s happening. Did you even read my comment?
    
    source
    -> View More Comments
  - GenderNeutralBro@lemmy.sdf.org ⁨9⁩ ⁨months⁩ ago
    I agree that the models themselves are clearly transformative. That doesn’t mean it’s legal for Meta to pirate everything on earth to use for training. THAT’S where the infringement is. And they admitted they used pirated material: techspot.com/…/101507-meta-admits-using-pirated-b…
    
    You want to use the same bullshit tactics and unreasonable math that the RIAA used in their court cases?
    
    I would enjoying seeing megacorps held to at least the same standards as individuals. I would prefer for those standards to be reasonable across the board, but that’s not really on the table here.
    
    source
unautrenom@jlai.lu ⁨9⁩ ⁨months⁩ ago
I’d argue it’s not useless, rather, it would remove any financial incentive for these companies to sink who knows how much into training AI. By putting them on the public domain, they would loose their competitve advantage over other cloud providers who could exploit it all the same, all the while not disturbing the current usage of AI.

Now, I do agree that destroying it would be even better, but I fear something like that would face too much force back by the parts of civil society who do use AI.

source
teawrecks@sopuli.xyz ⁨9⁩ ⁨months⁩ ago
Destroying it is both not an option, and an objectively regressive suggestion to even make.

Destruction isn’t possible because even if you deleted every bit of information from every hard drive in the world, now that we know it’s possible, someone would recreate it all in a matter of months.

Regressive because you’re literally suggesting that we destroy a new technology because we’re afraid of what it will do to the technology it replaces. Meanwhile, there’s a very decent chance that AI is our best chance at solving the energy/climate crises through advancing nuclear tech, as well as surviving the next pandemic via ground breaking protein folding tech.

I realize AI tech makes people uncomfortable (for…so many reasons), but becoming old fashioned conservatives in response is not a solution.

source
- Bronzebeard@lemm.ee ⁨9⁩ ⁨months⁩ ago
  I never suggested destroying the technology that is “AI”. I’m not uncomfortable about AI, I’ve even considered pivoting my career in that direction.
  
  I suggested destroying the particular implementation that was trained on the illegitimate data. If someone can recreate it using legitimate data, GREAT. That’s what we want to happen. The tool isn’t the problem. It’s the method they’re using to train them.
  
  Please don’t make up random ass narratives I never even hunted at, and then argue against them.
  
  source
  - teawrecks@sopuli.xyz ⁨9⁩ ⁨months⁩ ago
    I didn’t misinterpret what you were saying, everything I said applies to the specific case you lay out. If illegal networks were somehow entirely destroyed, someone would just make them again. That’s my point, there’s no way around that, there’s just holding people accountable when they do it. IMO that takes the form of restitutions to the people proportional to profits.
    
    source
    Bronzebeard@lemm.ee ⁨9⁩ ⁨months⁩ ago
    This is the dumb kind of “best do nothing, because both no is perfect” approach to making sure no disincentives are ever taken because someone somewhere else might also try to do the illegal thing that they’ll lose access to the moment they’re caught…
    
    source
    -> View More Comments
- Sas@beehaw.org ⁨9⁩ ⁨months⁩ ago
  Mate LLMs are literally gobbling up energy as if they’re working at a power plant gloryhole. It’s furthering the climate crisis, not solving it. They’re also incapable of logic to make something new so they’re not gonna invent anything. AI in general has it’s uses but LLMs are not the golden goose you should bet on. And profits from them are afaik non existent. They only come from investors thinking it’ll be profitable some day but it’s a way too energy intense process to be profitable
  
  source
  - teawrecks@sopuli.xyz ⁨9⁩ ⁨months⁩ ago
    I understand that you are familiar with the buzzword “LLM”, but let me introduce you to a different one: transformers.
    
    Virtually all modern successful AIs are based on transformers, LLMs included. I agree that LLMs currently amount to a chinese-room-inspired parlor trick, but the money involved has no doubt advanced all transfomer-based AI research, both directly (what works for LLMs may generalize) and indirectly (the market demand for LLMs in consumer products has created the a demand for power and compute hardware).
    
    We have transformer-based AI to thank for our understanding of the covid19 protein, and developing a safe and effective vaccine in a timely manner.
    
    The massive demand for energy has convinced Microsoft, Meta, and others to invest in their own modern nuclear power plants, representing a monumental step forward in sustainable energy generation that we have been trying to convince the US government to take for decades.
    
    Modern AI is being used to solve the hardest problems of nuclear fusion. If we can finally crack that nut, there’s no telling what’s possible.
    
    But specifically when it comes to LLMs, profitable or not, people obviously find them useful. People aren’t using it in place of search engines, or doing all their homework with it because they don’t find it useful. My only argument is that any AI trained on public content without consent should be required to effectively buy a license from, or pay royalties to the public. If McDonald’s is going to replace their front counters with AI trained on public content, then they should have to pay taxes proportional to how much use they get from that AI.
    
    In the theoretical extreme, if someone trains an AI on the general public’s data, and is able to create an AI that somehow replaces every job on earth, then congrats, we now live in a post-work society, we just need to reach out and take it rather than letting one person capitalize infinitely.
    
    And at the end of the day, if you honestly believe the profits from AI are non-existent, then what are you worried about? All those companies putting all their eggs in the LLM basket are going to disappear overnight when the AI bubble finally pops, right?
    
    source
    Sas@beehaw.org ⁨9⁩ ⁨months⁩ ago
    There’s a reason why in my comment i talked about LLMs as bad while saying AI in general has it’s uses. The reason being this post being about LLMs.
    
    I know very well that specialized AI has a lot of uses in medical science and other fields but that’s not really what got hit with all the hype, is it? The hype is managers saw a language model give seemingly better answers to questions than John Rando from 2 blocks down the road so they’re now looking to cut out all the already low paid workers and spoiler alert we will not land in a society where the general public profits from not having work. It will be the same owners of capital profiting as per usual.
    
    source
    -> View More Comments
    fracture@beehaw.org ⁨9⁩ ⁨months⁩ ago
    would love to see a source for AI helping with the covid 19 vaccine
    
    source
    -> View More Comments