State of the art of local models is even further. For example, look into https:/...

qeternity · 2025-06-01T21:58:55 1748815135

It's not impressive nor efficient when you consider batch sizes > 1.

p12tic · 2025-06-01T22:00:42 1748815242

All of this is for batch size 1.

qeternity · 2025-06-02T23:34:42 1748907282

I know. That was my point.

Throughput doesn't scale on CPU as well as it does on GPU.

p12tic · 2025-06-03T20:36:23 1748982983

We both agree. Batch size 1 is only relevant to people who want to run models on their own private machines. Which is the case of OP.