Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I had to handle this problem, I'd do some kind of "split on existing loaded GPUs" for new sessions, and then when some cap is hit, spool an additional GPU in the background and the transfer the new session to that GPU as soon as the model is loaded.

I'd have to play with the configuration and load calcs, but I'm sure there's a low param, neat solution to the request/service problem.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: