Comment

I agree that the hardware being used right now is not well suited. I don’t agree that it’s strictly necessary to use the right hardware – there’s just less tedious waiting involved for the computation to happen if you’ve got better hardware. Real-time interaction is the boundary where you need to have good enough hardware. For everything else you just have to be patient enough – sometimes absurdly so, but you could, in principle, still perform the computation.

LLMs are as close as we have right now, and they have miles to go. But they need hundreds of times more power than the brain does. No it won’t be soon and it won’t be with this kind of silicon processors.

There are people already baking LLMs into custom hardware – e.g. chatjimmy.ai

Their demo page isn’t the best LLM I’ve seen (Qwen and Gemma are much more clever and more likely to give decent results) but this is a taste of what’s possible… It gives responses at ~17000 tokens a second today.

If I could get answers back from the best Qwen model I’ve got at that speed, I could just retry every query three times, feed it through another pass to self-assess the results, and then reply before you can blink. That would get rid of a lot of the “confidently claims knowledge about a made up subject” issue we currently see – we can do the same thing on CPUs/GPUs but you’re stuck waiting so long for the result that most people don’t bother.

source

Sort:hotnew top