Comment on Has Generative AI Already Peaked? - Computerphile
eleitl@lemmy.ml 6 months agoThere are diminishing returns in semiconductor photolitho. Moore scaling is long over, absolute real estate see WSI with Cerebras, DC costs and power envelope are all sending a clear message. Quantization is there, so you can go from digital multipliers to analog and go spiking networks, but transformers and Co have little power there.
Also, the kind of economy that can carry Gen AI as business model is not a given, long term.
jarfil@beehaw.org 6 months ago
Neuromorphic hardware is going to jump many orders of magnitude over classic hardware. When we get a RAM that can execute multiple layers in parallel at once, per clock tick, we’ll see whole AI ecosystems cooperating to get a solution in a fraction of the time a single modern NN would take.
eleitl@lemmy.ml 6 months ago
Yes orders of magnitude, but not too many of them. The real estate of a 300 mm wafer is limited, the structure shrink is saturating and you can’t get too many layers. You still need a packet switched network on the wafer even if the rest is mostly analog. Perhaps spintronics can limit the power requirements too.
jarfil@beehaw.org 6 months ago
The orders of magnitude will come from the RAM running a whole layer at once in “a single clock”, without the need for a processor to execute any of it. It’s conceivable that multiple layers could be written/“programmed” into neuromorphic RAM, then a processor could just write the inputs, send an execute, move data from outputs to the next inputs, and repeat for all layers.
For example, an nVidia A100 goes up to 1,200 INT8 TOPS with 80GB of RAM at 1500MHz… but if the RAM could execute a neural network directly, that could raise it up to 80G*1.5G=120,000,000 INT8 TOPS, or 5 orders of magnitude.
eleitl@lemmy.ml 6 months ago
A free running cellular automaton (CA) approach in hardware would work, but each cell would be a much souped up SRAM cell, the interactions would be all local and 2D. Considering Cerebras is 40 G SRAM on the 300 mm WSI and is about at the cooling limit I’m afraid you do not have 5 orders of magnitude. Perhaps reversible spintronics can help with the power draw, but you still have to splat a higher dimensional network so not just local interactions into a 2D array.