We know that most of the closed source models are way more complicated, so let’s say they take 3 times the cost to generate a response.
This is completely arbitrary and supposition. Is it 3x “regular” response? I have no idea. How do you even arrive at that guess? Is a more complex prompt exponential more expensive? Linearly? Logarithmically? And how complex are we talking when system prompts themselves can be 10k tokens?
Generating an AI voice to speak the lines increases that energy cost exponentially. MIT found that generating a grainy, five-second video at 8 frames per second on an open source model took about 109,000 joules
Why did you go from voice gen to video gen? I mean I don’t know whether video gen takes more joules or not but there’s no actual connection here. You just decided that a line of audio gen is equivalent to 40 genres of video. What if they generate the text and then use conventional voice synthesizers? And what does that have to do with video gen?
If these estimates are close
Who even knows, mate? You’ve been completely fucking arbitrary and, shocker, your analysis supports your supposition, kinda. How many Vader lines are you going to get in 30 minutes? When it’s brand new probably a lot, but after the luster wears off?
I’m not even telling you you’re wrong, just that your methodology here is complete fucking bullshit.
It could be as low as 6500 joules (based on your statement) which changes the calculus to 60 lines per half hour. Is it that low? Probably not, but that is every bit as valid as your math and I’m even using your numbers without double checking you.
At the end of the day maybe I lose the bet. Fair. I’ve been wondering for a bit how they actually stack up, and I’m willing to be shown. But I suspect using it for piddly shit day to day is a drop in the bucket compared to all the mass corporate spam. Bit I’m aware it’s nothing but a hypothesis and I’m willing to be proven wrong. But not based on this.
Even_Adder@lemmy.dbzer0.com 2 days ago
TTS models are tiny in comparison to LLMs. How does this track? The biggest I could find was Orpheus-TTS that comes in 3B/1B/400M/150M parameter sizes. They are not using a 600 billion parameter LLM to generate Vader’s responses, that is likely way too big. After generating the text, speech isn’t even a drop in the bucket.
You need to include parameter counts in your calculations. A lot of these assumptions are so wrong it borders on misinformation.
theangriestbird@beehaw.org 2 days ago
I will repeat what I said in another reply below: if the cost of running these closed source AI models was as negligible as you are suggesting, then these companies would be screaming it from the rooftops to get the stink of this energy usage story off their backs. AI is all investors and hype right now, which means the industry is extra vulnerable to negative stories. By staying silent, the AI companies are allowing people like me to make wild guesses at the numbers and possibly fear-monger with misinformation. They could shut up all the naysayers by simply releasing their numbers. The fact that they are still staying silent despite all the negative press suggests that the energy usage numbers are far worse than anyone is estimating.
Even_Adder@lemmy.dbzer0.com 2 days ago
This doesn’t mean you can misrepresent facts like this though. The line I quoted is misinformation, and you don’t know what you’re talking about. I’m sorry this sounds so aggressive, but it’s the only way I can phrase it.