Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have some experience in this area, having both worked on machine learning frameworks, trained large models on datacenters, and have my own personal machine for tinkering around with.

This makes very little sense. Even if he was able to achieve his goals, consumer GPU hardware is bounded by network and memory, so it's a bad target to optimize. Fast device-to-device communication is only available on datacenter GPUs, and is essential for models training like LLaMA, Stable Diffusion, etc. Amdahl's law strikes again.



Eh... no? Stable Diffusion works fine on single device. Ditto for smaller LLaMA.


That's for inference, I'm referring to training.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: