To be fair, MoE is not new, and we already have a couple of good ~20Bs like Baidu Ernie and GPT-OSS (which they seem to have specifically excluded from comparisons).
You can fit much larger models onto 8GB with the experts on the CPU and the ‘dense’ parts like attention on GPU. Even GLM 4.5 Air (120B) will run fairly fast if your RAM is decent.
LiveLM@lemmy.zip 10 hours ago
Awesome, thanks!