Question about upcoming trends and hardware requirements

Submitted ⁨⁨1⁩ ⁨year⁩ ago⁩ by ⁨comfy@lemmy.ml⁩ to ⁨stable_diffusion@lemmy.dbzer0.com⁩

I want to buy a new GPU mainly for SD. The machine-learning space is moving quickly so I want to avoid buying a brand new card and then a fresh model or tool comes out and puts my card back behind the times. On the other hand, I also want to avoid needlessly spending extra thousands of dollars pretending I can get a ‘future-proof’ card.

I’m currently interested in SD and training LoRas (etc.). From what I’ve heard, the general advice is just to go for maximum VRAM.

Is there any extra advice I should know about?
Is NVIDIA vs. AMD a critical decision for SD performance?

I’m a hobbyist, so a couple of seconds difference in generation or a few extra hours for training isn’t going to ruin my day.

Some example prices in my region, to give a sense of scale:

16GB AMD: $350
16GB NV: $450
24GB AMD: $900
24GB NV: $2000

source

Comments

Sort:hotnew top

altima_neo@lemmy.zip ⁨1⁩ ⁨year⁩ ago
Basically, avoid AMD if you’re serious about it. Direct ML just can’t compete with cuda. Performance with stable diffusion on Nvidia blows away AMD.

A 4090 is as fast as it gets for consumer hardware. I’ve got a 3090, and it’s got the same amount of vram as a 4090 (24GB), but no where near as fast. So a 3090/TI would be a good budget option.

However, if you’re willing to wait, they’re saying Nvidia will be announcing the 5000 series in January. I’m not sure when they’ll release though. Plus there’s the whole stock problems with a new series launch. But the 5090 is rumored to have 32GB vram.

source
- WalnutLum@lemmy.ml ⁨1⁩ ⁨year⁩ ago
  ROCm is comparable but very few applications work out of the box with it.
  
  source
- wewbull@feddit.uk ⁨1⁩ ⁨year⁩ ago
  I’ve tried to find comparison data on performance between AMD Vs Nvidia and I see lots of people saying what you’re saying, but I can never find numbers. Do you know of any?
  
  If a card is less than half price, maybe I don’t mind it’s lower performance. It all depends on how much lower.
  
  Also, is the same true under Linux?
  
  source
  - Cirk2@programming.dev ⁨1⁩ ⁨year⁩ ago
    Its highly dependent on implementation.
    
    pugetsystems.com/…/stable-diffusion-performance-p…
    
    The experience on Linux is good (use docker otherwise python is dependency hell) but the basic torch based implementations (automatic, comfy) have bad performance. I have not managed to get shark to run on linux, the project is very windows focused and has no documentation for setup besides “run the installer”.
    
    Basically all of the vram trickery in torch is dependent on xformers, which is low-level cuda code and therefore does not work on amd. And has a running project to port it, but it’s currently to incomplete to work.
    
    source
- comfy@lemmy.ml ⁨1⁩ ⁨year⁩ ago
  I found a couple of 2022 posts recommending 3090s, especially since cryptocoin miners were selling lots of them cheap. Thanks for the heads up about the 5000 release, I suspect it will be above my budget but it will net me better deals on a 4090 :P
  
  source
  - WalnutLum@lemmy.ml ⁨1⁩ ⁨year⁩ ago
    DirectML sucks but ROCm is great, but you need to check if the software you want to use works with ROCM. Also note there’s only like 4 cards that work with ROCm as well.
    
    source
  - altima_neo@lemmy.zip ⁨1⁩ ⁨year⁩ ago
    Yeah I don’t think 4090 is going down in price. As of now, they’re more expensive than even they launched and it seems production is ramping down.
    
    source
wewbull@feddit.uk ⁨1⁩ ⁨year⁩ ago
So you could buy 2xAMD 24GB and a good power supply to power them for a 4090.

source
- comfy@lemmy.ml ⁨1⁩ ⁨year⁩ ago
  I didn’t even think of dual cards, because I have an old & budget motherboard with one slot. But 2 x 16GB GPUs and a new motherboard (and if necessary, new CPU) and PSU and it might even still be cheaper than a 24GB NVIDIA for me. Of course I’d have to explore the trade-offs in detail because I’ve never looked into how dual cards work.
  
  (but truth be told, I just as easily could settle for a 1x16 GB if I’m confident it would be able to train, even if slowly, AuraFlow or FLEX LoRas for the upcoming Pony v7 model. It’s just a hobby.)
  
  source
  - wewbull@feddit.uk ⁨1⁩ ⁨year⁩ ago
    Yeah. I don’t think dual cards is a great solution as I don’t think they can both be made to work on the same job at the same time, but maybe if you were generating many images it would make sense.
    
    I don’t know, but maybe somebody else has experience.
    
    source
bubbalu@hexbear.net ⁨1⁩ ⁨year⁩ ago
quit you are killing the earth with this inane bullshit.

source
- comfy@lemmy.ml ⁨1⁩ ⁨year⁩ ago
  Do you know how much wattage those GPUs use, even if I disconnected my solar panels and ran the card 100% 24/7? Protip: it rounds down to zero.
  
  If you’re serious about the global environmental crisis, comrade, organize with others to fight industrial-scale culprits instead of wasting your valuable time blaming trivial people.
  
  source
  - bubbalu@hexbear.net ⁨1⁩ ⁨year⁩ ago
    Enjoy your treats!
    
    source
  - wewbull@feddit.uk ⁨1⁩ ⁨year⁩ ago
    It’s the training that’s the issue more than inferring.
    
    source
j4k3@lemmy.world ⁨1⁩ ⁨year⁩ ago
My 16 GB 3080Ti is only annoying with Flux gens right now. Those take like 1.5-2 minutes each and need a lot of iterations. My laptop heat saturates from Flux. It could get better if the tools support splitting the workflow between GPU and CPU like with Textgen, but at the moment it is GPU only and kinda sucks. Stuff like Pony runs fast enough at around 20-30 seconds for most models.

source
- comfy@lemmy.ml ⁨1⁩ ⁨year⁩ ago
  I am using a lot of Pony* models at the moment and Pony v7 looks like it will switch to AuraFlow or FLUX so it’s useful to hear your experience with it on a 3080Ti.
  
  source
Evanyost@r.nf ⁨1⁩ ⁨year⁩ ago
Your idea is very impressive and unique. Papa’s freezeria

source