It’s even more complicated than that: “AI” is not even a well-defined term. Back when Quake 3 was still in beta (“the demo”), id Software held a competition to develop “bot AIs” that could be added to a server so players would have something to play against while they waited for more people to join (or you could have players VS bots style matches).
That was over 25 years ago. What kind of “AI” do you think was used back then? 🤣
The AI hater extremists seem to be in two camps:
- Data center haters
- AI-is-killing-jobs
The data center haters are the strangest, to me. Because there’s this default assumption that data centers can never be powered by renewable energy and that AI will never improve to the point where it can all be run locally on people’s PCs (and other, personal hardware).
Yet every day there’s news suggesting that local AI is performing better and better. It seems inevitable—to me—that “big AI” will go the same route as mainframes.
utopiah@lemmy.world 3 weeks ago
Can you please share examples and criteria?
dogslayeggs@lemmy.world 3 weeks ago
Sure. My company has a database of all technical papers written by employees in the last 30-ish years. Nearly all of these contain proprietary information from other companies (we deal with tons of other companies and have access to their data), so we can’t build a public LLM nor use a public LLM. So we created an internal-only LLM that is only trained on our data.
Fmstrat@lemmy.world 2 weeks ago
I’d bet my lunch this internal LLM is a trained open weight model, which has lots of public data in it. Not complaining about what your company has done, as I think that makes sense, just providing a counterpoint.
utopiah@lemmy.world 2 weeks ago
You are solely using your own data or rather you are refining an existing LLM or rather RAG?
I’m not an expert but AFAIK training an LLM requires, by definition, a vast mount of text so I’m skeptical that ANY company publish enough papers to do so. I understand if you can’t share more about the process. Maybe me saying “AI” was too broad.
tb_@lemmy.world 2 weeks ago
Completely from scratch?
oplkill@lemmy.world 3 weeks ago
It can use public domain licenced data
utopiah@lemmy.world 2 weeks ago
Right, and to be clear I’m not saying it’s not possible. This isn’t a trick question, it’s a genuine request to hopefully be able to rely on such tools.
Fmstrat@lemmy.world 2 weeks ago
www.swiss-ai.org/apertus
Fully open source, even the training data is provided for download. That being said, this is the only one I know of.
utopiah@lemmy.world 2 weeks ago
Thanks, a friend recommended it few days ago indeed but unfortunately AFAICT they don’t provide the CO2eq in their model card nor an analogy equivalence non technical users could understand.
Hackworth@piefed.ca 3 weeks ago
Adobe’s image generator (Firefly) is trained only on images from Adobe Stock.
utopiah@lemmy.world 2 weeks ago
Does it only use that or doesn’t it also use an LLM to?
Hackworth@piefed.ca 2 weeks ago
The Firefly image generator is a diffusion model, and the Firefly video generator is a diffusion transformer. LLMs aren’t involved in either process. I believe there are some ChatGPT integrations with Reader and Acrobat, but that’s unrelated to Firefly.