Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(Not the author but I work in real-time voice.) WebSockets don't really translate to actual GPU load, since they spend a ton of time idling. So strictly speaking, you don't need a GPU per WebSocket assuming your GPU infra is sufficiently decoupled from your user-facing API code.

That said, a GPU per generation (for some operational definition of "generation") isn't uncommon, but there's a standard bag of tricks, like GPU partitioning and batching, that you can use to maximize throughput.



> that you can use to maximize throughput

While degrading the experience sometimes, little or by a lot, thanks to possible "noisy neighbors". Worth keeping in mind that most things are trade-offs somehow :) Mostly important for "real-time" rather than batched/async stuff of course.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: