Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Tostino
11 months ago
|
parent
|
context
|
favorite
| on:
Llama.cpp AI Performance with the GeForce RTX 5090...
You are missing something. This is a single stream of inference. You can load up the Nvidia card with at least 16 inference streams and get at much higher throughout tokens/sec.
This just is just a single user chat experience benchmark.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
This just is just a single user chat experience benchmark.