Tell me you don’t know how the technology works without telling me. The “good” models are huge. They make servers larger than your local town centre run on full load for every query. There’s no throttle to hold them back, as that doesn’t help. It’ll always be full speed ahead. And the results are based on probability, with a factor of bullshit baked in as part of the design. Yes, computers become more efficient over time, but that doesn’t help when you keep pushing them to go full speed all the time. The thirstiest CPU in the original Pentium family used to consume 17.3 Watts at full speed. Today, the Intel Xeon w9-3595X slurps 462 Watts given the chance. The Nvidia B200 datacenter GPU has a TDP of 1000 Watts. And data centers have thousands of these running, full speed, for LLMs to serve us hallucinated “facts”.