Comment

Comment on Advice - Getting started with LLMs

Training your own will be very difficult. You will need to gather so much data to get a model that has basic language understanding.

What I would do (and am doing) is just taking something like llama3 or mistral and adding your own content using RAG techniques.

source

Sort:hotnew top

BaroqueInMind@lemmy.one ⁨1⁩ ⁨year⁩ ago
OLlama is so fucking slow. Even with a 16-core overclocked Intel on 64Gb RAM with an Nvidia 3080 10Gb VRAM, using a 22B parameter model, the token generation for a simple haiku takes 20 minutes.

source
- xcjs@programming.dev ⁨1⁩ ⁨year⁩ ago
  No offense intended, but are you sure it’s using your GPU? Twenty minutes is about how long my CPU-locked instance takes to run some 70B parameter models.
  
  On my RTX 3060, I generally get responses in seconds.
  
  source
  - kiku123@feddit.de ⁨1⁩ ⁨year⁩ ago
    I agree. My 3070 runs the 8B Llama3 model in about 250ms, especially for short responses.
    
    source
- xcjs@programming.dev ⁨1⁩ ⁨year⁩ ago
  Ok, so using my “older” 2070 Super, I was able to get a response from a 70B parameter model in 9-12 minutes. (Llama 3 in this case.)
  
  I’m fairly certain that you’re using your CPU or having another issue. Would you like to try and debug your configuration together?
  
  source
  - BaroqueInMind@lemmy.one ⁨1⁩ ⁨year⁩ ago
    I think I fucked up my docker setup and will wipe and start over.
    
    source
    xcjs@programming.dev ⁨1⁩ ⁨year⁩ ago
    Good luck! I’m definitely willing to spend a few minutes offering advice/double checking some configuration settings if things go awry again. Let me know how things go. :-)
    
    source
    -> View More Comments
- Zworf@beehaw.org ⁨1⁩ ⁨year⁩ ago
  Hmmm weird. I have a 4090 / Ryzen 5800X3D and 64GB and it runs really well. Admittedly it’s the 8B model because the intermediate sizes aren’t out yet and 70B simply won’t fly on a single GPU.
  
  But it really screams. Much faster than I can read.
  
  PS: Ollama is just llama.cpp under the hood.
  
  source
  - xcjs@programming.dev ⁨1⁩ ⁨year⁩ ago
    It should be split between VRAM and regular RAM, at least if it’s GGUF model. Maybe it’s not, and that’s what’s wrong?
    
    source
  - BaroqueInMind@lemmy.one ⁨1⁩ ⁨year⁩ ago
    What is the appropriate size for 10Gb VRAM?
    
    source
    Zworf@beehaw.org ⁨1⁩ ⁨year⁩ ago
    It depends on your prompt/context size too. The more you have the more memory you need. Try to check the memory usage of your GPU with GPU-Z with different models and scenarios.
    
    source