I have some experience in this area, having both worked on machine learning frameworks, trained large models on datacenters, and have my own personal machine for tinkering around with.
This makes very little sense. Even if he was able to achieve his goals, consumer GPU hardware is bounded by network and memory, so it's a bad target to optimize. Fast device-to-device communication is only available on datacenter GPUs, and is essential for models training like LLaMA, Stable Diffusion, etc. Amdahl's law strikes again.
This makes very little sense. Even if he was able to achieve his goals, consumer GPU hardware is bounded by network and memory, so it's a bad target to optimize. Fast device-to-device communication is only available on datacenter GPUs, and is essential for models training like LLaMA, Stable Diffusion, etc. Amdahl's law strikes again.