Comment on Megrez2: 21B latent, 7.5B on VRAM, 3B active—MoE on single 8GB card
brucethemoose@lemmy.world 1 week agoTo be fair, MoE is not new, and we already have a couple of good ~20Bs like Baidu Ernie and GPT-OSS (which they seem to have specifically excluded from comparisons).
You can fit much larger models onto 8GB with the experts on the CPU and the ‘dense’ parts like attention on GPU. Even GLM 4.5 Air (120B) will run fairly fast if your RAM is decent.