Comment on OpenAI says it’s “impossible” to create useful AI models without copyrighted material

<- View Parent
teawrecks@sopuli.xyz ⁨11⁩ ⁨months⁩ ago

Sure, if they want to compete with modern artists, they would need to look at modern artists

Which is the literal goal of Dall-E, SD, etc.

But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works

They could definitely learn some amount of skill, I agree. I’d be very interested to see the best that an AI could achieve using only PD and CC content. It would be interesting. But you’d agree that it would look very different from modern art, just as an alien who has only been consuming earth media from 100+ years ago would be unable to relate to us.

the sky above them and the tree across the street aren’t copyrighted.

Yeah, I’d consider that PD/CC content that such an AI would easily have access to. But obviously the real sky is something entirely different from what is depicted in Starry Night, Star Wars, or H.P. Lovecraft’s description of the cosmos.

OpenAI’s argument is literally that their AI cannot learn without using copyrighted materials in vast quantities

Yeah, I’d consider that a strong claim on their part; what they really mean is, it’s the easiest way to make progress in AI, and we wouldn’t be anywhere close to where we are without it.

And you could argue “convenient that it both saves them money, and generates money for them to do it this way”, but I’d also point out that the alternative is they keep the trained models closed source, never using them publicly until they advance the tech far enough that they’ve literally figured out how to build/simulate a human brain that is able to learn as quickly and human-like as you’re describing. And then we find ourselves in a world where one or two corporations have this incredible proprietary ability that no one else has.

Personally, I’d rather live in the world where the information about how to do all of this isn’t kept for one or two corporations to profit from, I would rather live in the version where they publish their work publicly, early, and often, show that it works, and people are able to reproduce it, open source it, train their own models, and advance the technology in a space where anyone can use it.

You could hypothesize of a middle ground where they do the research, but aren’t allowed to profit from it without licensing every bit of data they train on. But the reality of AI research is that it only happens to the extent that it generates revenue. It’s been that way for the entire history of AI. Douglas Hofstadter has been asking deep important questions about AI as it relates to consciousness for like 60 years (ex. GEB, I am a Strange Loop), but there’s a reason he didn’t discover LLMs and tech companies did. That’s not to say his writings are meaningless, in fact I think they’re more important than ever before, but he just wasn’t ever going to get to this point with a small team of grad students, a research grant, and some public domain datasets.

So, it’s hard to disagree with OpenAI there, AI definitely wouldn’t be where it is without them doing what they’ve done. And I’m a firm believer that unless we figure our shit out with energy generation soon, the earth will be an uninhabitable wasteland. We’re playing a game of climb the Kardashev scale, we opted for the “burn all the fossil fuels as fast as possible” strategy, and now we’re a the point where either spent enough energy fast enough to figure out the tech needed to survive this, or we suffocate on the fumes. The clock is ticking, and AI may be our best bet at saving the human race that doesn’t involve an inordinate number of people dying.

source
Sort:hotnewtop