Megrez2: 21B latent, 7.5B on VRAM, 3B active—MoE on single 8GB card
Submitted 1 day ago by cm0002@lemmy.world to technology@lemmy.zip
https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-GGUF
Submitted 1 day ago by cm0002@lemmy.world to technology@lemmy.zip
https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-GGUF
LiveLM@lemmy.zip 18 hours ago
ELI5, please?
felsiq@piefed.zip 17 hours ago
Trying to literally ELI5 so this might be oversimplified a bit:
New AI model using a Mixture of Experts (MoE) approach, which combines different AIs that are optimized for certain things into one AI that’s good at more things. This usually needs a lot of space on graphics cards and requires really high end hardware, but this model fits into 8gb of space on a card, which is a very common amount to have on a modern graphics card, so many more people will be able to use it.
LiveLM@lemmy.zip 10 hours ago
Awesome, thanks!
brucethemoose@lemmy.world 17 hours ago
To be fair, MoE is not new, and we already have a couple of good ~20Bs like Baidu Ernie and GPT-OSS (which they seem to have specifically excluded from comparisons).
You can fit much larger models onto 8GB with the experts on the CPU and the ‘dense’ parts like attention on GPU. Even GLM 4.5 Air (120B) will run fairly fast if your RAM is decent.