Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because ML infra is bloatware beyond belief.

If it was engineered right, it would take:

- transfer model weights from NVMe drive/RAM to GPU via PCIe

- upload tiny precompiled code to GPU

- run it with tiny CPU host code

But what you get instead is gigabytes of PyTorch + Nvidia docker container bloatware (hi Nvidia NeMo) that takes forever to start.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: