I don’t really see how this works though — Isn’t it the case that longer “comput...

ninkendo · 2025-08-08T03:04:14 1754622254

Nah, it’d take all night because it would be using the GPU for a fraction of the time, splitting the time with other customer’s tokens, and letting higher priority workloads preempt it.

If you buy enough GPUs to do 1000 customers’ requests in a minute, you could run 60 requests for each of these customers in an hour, or you could run a single request each for 60,000 customers in that same hour. The latter can be much cheaper per customer if people are willing to wait. (In reality it’s a big N x M scheduling problem, and there’s tons of ways to offer tiered pricing where cost and time are the main trafeoffs.)