xcjs

@xcjs@programming.dev

This is a remote user, information on this page may be incomplete. View at Source ↗

⁨Comment⁩ on ⁨Google will require developer verification for Android apps outside the Play Store⁩ ⁨⁨1⁩ ⁨month⁩ ago⁩:
Let Google know what you think about this: docs.google.com/forms/d/e/…/viewform
⁨Comment⁩ on ⁨Advice - Getting started with LLMs⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
We all mess up! I hope that helps - let me know if you see improvements!
⁨Comment⁩ on ⁨Advice - Getting started with LLMs⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
I think there was a special process to get Nvidia working in WSL. Let me check… (I’m running natively on Linux, so my experience doing it with WSL is limited.)

docs.nvidia.com/cuda/wsl-user-guide/index.html - I’m sure you’ve followed this already, but according to this. It looks like you don’t want to install the Nvidia drivers, and only want to install the cuda-toolkit metapackage. I’d follow the instructions from that link closely.

You may also run into performance issues within WSL due to the virtual machine overhead.
⁨Comment⁩ on ⁨Advice - Getting started with LLMs⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Good luck! I’m definitely willing to spend a few minutes offering advice/double checking some configuration settings if things go awry again. Let me know how things go. :-)
⁨Comment⁩ on ⁨Advice - Getting started with LLMs⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
It should be split between VRAM and regular RAM, at least if it’s GGUF model. Maybe it’s not, and that’s what’s wrong?
⁨Comment⁩ on ⁨Advice - Getting started with LLMs⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Ok, so using my “older” 2070 Super, I was able to get a response from a 70B parameter model in 9-12 minutes. (Llama 3 in this case.)

I’m fairly certain that you’re using your CPU or having another issue. Would you like to try and debug your configuration together?
⁨Comment⁩ on ⁨Advice - Getting started with LLMs⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
Unfortunately, I don’t expect it to remain free forever.
⁨Comment⁩ on ⁨Advice - Getting started with LLMs⁩ ⁨⁨1⁩ ⁨year⁩ ago⁩:
No offense intended, but are you sure it’s using your GPU? Twenty minutes is about how long my CPU-locked instance takes to run some 70B parameter models.

On my RTX 3060, I generally get responses in seconds.