Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You are missing something. This is a single stream of inference. You can load up the Nvidia card with at least 16 inference streams and get at much higher throughout tokens/sec.

This just is just a single user chat experience benchmark.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: