Training an AI is orthogonal to copyright since the process of training doesn’t involve distribution.
You can train an AI with whatever TF you want without anyone’s consent. That’s perfectly legal fair use. It’s no different than if you copy a song from your PC to your phone.
Copyright really only comes into play when someone uses an AI to distribute a derivative of someone’s copyrighted work. Even then, it’s really the end user that is even capable of doing such a thing by uploading the output of the AI somewhere.
Mika@sopuli.xyz 1 day ago
I can almost guarantee that hundred billion params LLMs are not trained on that, and are trained on the whole web scraped to the furthest extent.
The only sane and ethical solution going forward is to force to opensource all LLMs. Use the datasets generated by humanity - give back to humanity.
Skullgrid@lemmy.world 1 day ago
Jesus fucking christ. There are SO GODDAMN MANY open source LLMs, even from fucking scumbags like facebook. I get that there’s subtleties to the argument on the ProAI vs AntiAI side, but you guys just screech and scream.
github.com/eugeneyan/open-llms
6nk06@sh.itjust.works 23 hours ago
Where are the sources? All I see is binary files.
Mika@sopuli.xyz 1 day ago
Lol, ofc meta, they have the biggest bigdata out there, full of private data.
Most of the opensources are recompilations of existing opensource LLMs.
And the page you’ve listed is <10b mostly, bar LLMs with huge financing, and generally either copropate or Chinese behind them.
Mika@sopuli.xyz 1 day ago
Besides, the article is about image gen AI, not LLMs.
onslaught545@lemmy.zip 12 hours ago
That’s an LLM, buddy.
unconfirmedsourcesDOTgov@lemmy.sdf.org 1 hour ago
What do you think the letters LLM stand for, pal?
null@lemmy.nullspace.lol 9 hours ago
Image Gen AI is an LLM?
Mika@sopuli.xyz 9 hours ago
Article directly complains about AI artwork. You know what LLM even means?