Megrez2: 21B latent, 7.5B on VRAM, 3B active—MoE on single 8GB card

⁨22⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨cm0002@lemmy.world⁩ to ⁨technology@lemmy.zip⁩

https://huggingface.co/Infinigence/Megrez2-3x7B-A3B-GGUF

Comments

Sort:hotnew top

LiveLM@lemmy.zip ⁨2⁩ ⁨months⁩ ago
ELI5, please?

source
- felsiq@piefed.zip ⁨2⁩ ⁨months⁩ ago
  Trying to literally ELI5 so this might be oversimplified a bit:
  
  New AI model using a Mixture of Experts (MoE) approach, which combines different AIs that are optimized for certain things into one AI that’s good at more things. This usually needs a lot of space on graphics cards and requires really high end hardware, but this model fits into 8gb of space on a card, which is a very common amount to have on a modern graphics card, so many more people will be able to use it.
  
  source
  - brucethemoose@lemmy.world ⁨2⁩ ⁨months⁩ ago
    To be fair, MoE is not new, and we already have a couple of good ~20Bs like Baidu Ernie and GPT-OSS (which they seem to have specifically excluded from comparisons).
    
    You can fit much larger models onto 8GB with the experts on the CPU and the ‘dense’ parts like attention on GPU. Even GLM 4.5 Air (120B) will run fairly fast if your RAM is decent.
    
    source
  - LiveLM@lemmy.zip ⁨2⁩ ⁨months⁩ ago
    Awesome, thanks!
    
    source