You are missing something. This is a single stream of inference. You can load up...

		Tostino 11 months ago \| parent \| context \| favorite \| on: Llama.cpp AI Performance with the GeForce RTX 5090... You are missing something. This is a single stream of inference. You can load up the Nvidia card with at least 16 inference streams and get at much higher throughout tokens/sec. This just is just a single user chat experience benchmark.