Comment

Comment on Megrez2: 21B latent, 7.5B on VRAM, 3B active—MoE on single 8GB card

Trying to literally ELI5 so this might be oversimplified a bit:

New AI model using a Mixture of Experts (MoE) approach, which combines different AIs that are optimized for certain things into one AI that’s good at more things. This usually needs a lot of space on graphics cards and requires really high end hardware, but this model fits into 8gb of space on a card, which is a very common amount to have on a modern graphics card, so many more people will be able to use it.

source

Sort:hotnew top

brucethemoose@lemmy.world ⁨5⁩ ⁨weeks⁩ ago
To be fair, MoE is not new, and we already have a couple of good ~20Bs like Baidu Ernie and GPT-OSS (which they seem to have specifically excluded from comparisons).

You can fit much larger models onto 8GB with the experts on the CPU and the ‘dense’ parts like attention on GPU. Even GLM 4.5 Air (120B) will run fairly fast if your RAM is decent.

source
LiveLM@lemmy.zip ⁨5⁩ ⁨weeks⁩ ago
Awesome, thanks!

source