6800 is RDNA2, not RDNA3. The latter is still waiting for ROCm support 4 months ...

BeefWellington · on April 17, 2023

I'm aware that a 6800 is not RDNA3. You stated broadly:

> Current AMD consumer cards have terrible software support and IMO isn't really an option. On Windows you might be able to use SHARK or DirectML ports, but nothing will run out of the box.

I was merely sharing that I did not have that same experience that current consumer cards have terrible support.

lhl · on April 17, 2023

Sure, and I was merely clarifying that only last-gen architectures work. While I'm glad that SD works for you, if we are getting into it, I think that having no support on current-gen flagship models does equal broadly terrible software support and is more worth highlighting, since otherwise someone might assume they could just go pick one of those up and get a 24GB GPU on the cheap, especially in the context of LLMs (which is what the OP was asking about).

For RDNA2, you apparently can get LLMs running, but it requires forking/patching both bitsandbytes and GPTQ: https://rentry.org/eq3hg - and this will be true for any library (eg, can you use accelerate? deepspeed? fastgen? who knows, but certainly no one is testing it and AMD doesn't care if you're not on CDNA). It's important to note again, anything that works atm will still only work with last-gen cards, on Linux-only (ROCm does not work through WSL), w/ limited VRAM (no 30Bq4 models), and since RDNA2 tensor support is awful, if the SD benchmarks are anything to go by, performance will still end up worse than an RTX 3050: https://www.tomshardware.com/news/stable-diffusion-gpu-bench...

BeefWellington · on April 17, 2023

> I think that having no support on current-gen flagship models does equal broadly terrible software support and is more worth highlighting, since otherwise someone might assume they could just go pick one of those up and get a 24GB GPU on the cheap, especially in the context of LLMs (which is what the OP was asking about).

Absolutely fair and I agree with this part. I started my reply with "FWIW" (For What It's Worth) on purpose.

> For RDNA2, you apparently can get LLMs running, but it requires forking/patching both bitsandbytes and GPTQ: https://rentry.org/eq3hg - and this will be true for any library (eg, can you use accelerate? deepspeed? fastgen? who knows, but certainly no one is testing it and AMD doesn't care if you're not on CDNA).

I haven't tried any of the GPU-based LLMs yet. SD leveraging PyTorch (which seems to have solid ROCm support) worked for me. It will not be faster than NVIDIA for sure but if someone already has a 16GB+ AMD card they may be able to at least play with stuff without needing to purchase an NVIDIA card instead.