Comment

Comment on Advice - Getting started with LLMs

Hmmm weird. I have a 4090 / Ryzen 5800X3D and 64GB and it runs really well. Admittedly it’s the 8B model because the intermediate sizes aren’t out yet and 70B simply won’t fly on a single GPU.

But it really screams. Much faster than I can read.

PS: Ollama is just llama.cpp under the hood.

source

Sort:hotnew top

xcjs@programming.dev ⁨1⁩ ⁨year⁩ ago
It should be split between VRAM and regular RAM, at least if it’s GGUF model. Maybe it’s not, and that’s what’s wrong?

source
BaroqueInMind@lemmy.one ⁨1⁩ ⁨year⁩ ago
What is the appropriate size for 10Gb VRAM?

source
- Zworf@beehaw.org ⁨1⁩ ⁨year⁩ ago
  It depends on your prompt/context size too. The more you have the more memory you need. Try to check the memory usage of your GPU with GPU-Z with different models and scenarios.
  
  source