(Not the author but I work in real-time voice.) WebSockets don't really translat... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		kabirgoel 10 months ago \| parent \| context \| favorite \| on: Show HN: A real time AI video agent with under 1 s... (Not the author but I work in real-time voice.) WebSockets don't really translate to actual GPU load, since they spend a ton of time idling. So strictly speaking, you don't need a GPU per WebSocket assuming your GPU infra is sufficiently decoupled from your user-facing API code. That said, a GPU per generation (for some operational definition of "generation") isn't uncommon, but there's a standard bag of tricks, like GPU partitioning and batching, that you can use to maximize throughput.

diggan 10 months ago [–]

> that you can use to maximize throughput

While degrading the experience sometimes, little or by a lot, thanks to possible "noisy neighbors". Worth keeping in mind that most things are trade-offs somehow :) Mostly important for "real-time" rather than batched/async stuff of course.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact