What is the appropriate size for 10Gb VRAM?
Comment on Advice - Getting started with LLMs
Zworf@beehaw.org 5 months agoHmmm weird. I have a 4090 / Ryzen 5800X3D and 64GB and it runs really well. Admittedly it’s the 8B model because the intermediate sizes aren’t out yet and 70B simply won’t fly on a single GPU.
But it really screams. Much faster than I can read.
PS: Ollama is just llama.cpp under the hood.
BaroqueInMind@lemmy.one 5 months ago
Zworf@beehaw.org 5 months ago
It depends on your prompt/context size too. The more you have the more memory you need. Try to check the memory usage of your GPU with GPU-Z with different models and scenarios.
xcjs@programming.dev 5 months ago
It should be split between VRAM and regular RAM, at least if it’s GGUF model. Maybe it’s not, and that’s what’s wrong?