Apparently, stealing other people’s work to create product for money is now “fair use” as according to OpenAI because they are “innovating” (stealing). Yeah. Move fast and break things, huh?
“Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” wrote OpenAI in the House of Lords submission.
OpenAI claimed that the authors in that lawsuit “misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.”
Haus@kbin.social 11 months ago
Try to train a human comedian to make jokes without ever allowing him to hear another comedian's jokes, never watching a movie, never reading a book or magazine, never watching a TV show. I expect the jokes would be pretty weak.
Phanatik@kbin.social 11 months ago
A comedian isn't forming a sentence based on what the most probable word is going to appear after the previous one. This is such a bullshit argument that reduces human competency to "monkey see thing to draw thing" and completely overlooks the craft and intent behind creative works. Do you know why ChatGPT uses certain words over others? Probability. It decided as a result of its training that one word would appear after the previous in certain contexts. It absolutely doesn't take into account things like "maybe this word would be better here because the sound and syllables maintains the flow of the sentence".
Baffling takes from people who don't know what they're talking about.
frog@beehaw.org 11 months ago
I wish I could upvote this more than once.
What people always seem to miss is that a human doesn’t need to billions of examples to be able to produce something that’s kind of “eh, close enough”. Artists don’t look at billions of paintings. They look at a few, but do so deeply, absorbing not just the most likely distribution of brushstrokes, but why the painting looks the way it does. For a basis of comparison, I did an art and design course last year and looked at about 300 artworks in total (course requirement was 50-100). The research component on my design-related degree course is one page a week per module (so basically one example from the field the module is about, plus some analysis). The real bulk of the work humans do isn’t looking at billions of examples: it’s looking at a few, and then practicing the skill and developing a process that allows them to convey the thing they’re trying to express.
If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.
DaDragon@kbin.social 11 months ago
That’s what humans do, though. Maybe not probability directly, but we all know that some words should be put in a certain order. We still operate within standard norms that apply to aparte group of people. LLM’s just go about it in a different way, but they achieve the same general result. If I’m drawing a human, that means there’s a ‘hand’ here, and a ‘head’ there. ‘Head’ is a weird combination of pixels that mostly look like this, ‘hand’ looks kinda like that. All depends on how the model is structured, but tell me that’s not very similar to a simplified version of how humans operate.
hascat@programming.dev 11 months ago
That’s not the point though. The point is that the human comedian and the AI both benefit from consuming creative works covered by copyright.
teawrecks@sopuli.xyz 11 months ago
Neither is an LLM. What you’re describing is a primitive Markov chain.
You may not like it, but brains really are just glorified pattern recognition and generation machines. So yes, “monkey see thing to draw thing”, except a really complicated version of that.
Think of it this way: if your brain wasn’t a reorganization and regurgitation of the things you have observed before, it would just generate random noise. There’s no such thing as “truly original” art or it would be random noise. Every single word either of us is typing is the direct result of everything you and I have observed before this moment.
Ironic, to say the least.
The point you should be making, is that a corporation will make this above argument up to, but not including the point where they have to treat AIs ethically. So that’s the way to beat them. If they’re going to argue that they have created something that learns and creates content like a human brain, then they should need to treat it like a human, ensure it is well compensated, ensure it isn’t being overworked or enslaved, ensure it is being treated “humanely”. If they don’t want to do that, if they want it to just be a well built machine, then they need to license all the proprietary data they used to build it. Make them pick a lane.
tryptaminev@feddit.de 11 months ago
You do know that comedians are copying each others material all the time though? Either making the same joke, or slightly adapting it?
So in the context of copyright vs. model training i fail to see how the exact process of the model is relevant? At the end copyrighted material goes in and material based on that copyrighted material goes out.
pupbiru@aussie.zone 11 months ago
you know how the neurons in our brain work, right?
because if not, well, it’s pretty similar… unless you say there’s a soul (in which case we can’t really have a conversation based on fact alone), we’re just big ol’ probability machines with tuned weights based on past experiences too
SuperSaiyanSwag@lemmy.zip 11 months ago
Am I a moron? How do you have more upvotes than the parent comment, is it because you’re being more aggressive with your statement? I feel like you didn’t quite refute what the parent comment said. You’re just explaining how Chat GPT works, but you’re not really saying how it shouldn’t use our established media as a reference.
intensely_human@lemm.ee 11 months ago
Text prediction seems to be sufficient to explain all verbal communication to me. Until someone comes up with a use case that humans can do that LLMs cannot, and I mean a specific use case not general high level concepts, I’m going to assume human verbal cognition works the same was as an LLM.
We are absolutely basing our responses on what words are likely to follow which other ones. It’s literally how a baby learns language from those around them.
luciole@beehaw.org 11 months ago
There’s this linguistic problem where one word is used for two different things, it becomes difficult to tell them apart. “Training” or “learning” is a very poor choice of word to describe the calibration of a neural network. The actor and action are both fundamentally different from the accepted meaning. To start with, human learning is active whereas machining learning is strictly passive: it’s something done by someone with the machine as a tool. Teachers know very well that’s not how it happens with humans.
When I compare training a neural network with how a trained to play clarinet, I fail to see any parallel. The two are about as close as a horse and a seahorse.
intensely_human@lemm.ee 11 months ago
Not sure what you mean by passive. It takes a hell of a lot of electricity to train one of these LLMs so something is happening actively.
I often interact with ChatGPT 4 as if it were a child. I guide it through different kinds of mental problems, having it take notes and evaluate its own output, because I know our conversations become part of its training data.
It feels very much like teaching a kid to me.
sculd@beehaw.org 11 months ago
AIs are not humans. Humans cannot read millions of texts in seconds and cannot split out millions of output at the same time.
Powderhorn@beehaw.org 11 months ago
A comedian walks on stage and says, “Why is there a mic here?”
sub_@beehaw.org 11 months ago
Try to train a human comedian to make jokes without ever allowing him to hear another comedian’s jokes, never watching a movie, never reading a book or magazine, never watching a TV show. I expect the jokes would be pretty weak.