What are you talking about? Why would you multiply the bandwidth? 8 4090s is sti...

boroboro4 · 2024-10-07T12:29:47 1728304187

There are different ways to run LLMs on multiple GPUs, one of them (called tensor parallelism) in low batch scenarios would be multiplying bandwidth between different GPUs. So no, 8 4090s is not 1000 GB/s.

behnamoh · 2024-10-07T23:24:00 1728343440

you've heard something and are regurgitating it without fully understanding it.

boroboro4 · 2024-10-15T19:38:27 1729021107

I’m developing inference engine, so I actually do understand how it works. As well as other types of parallelism and how exactly they do different trade offs

klohto · 2024-10-07T12:48:22 1728305302

let me know how is the PCIe bandwidth treating you

boroboro4 · 2024-10-15T19:40:17 1729021217

Since we’re talking about small batch sizes PCIe bandwidth isn’t as important - intermediate hidden state is magnitude smaller than weights.

andersa · 2024-10-07T13:51:31 1728309091

When you have 8 GPUs, you can use more than 1 at a time.