Once GPT-4o mini launched, I noticed that it didn't really perform any better than GPT-4o. I thought this might change over time, but it still hasn't, so I finally sat down to do some more comprehensive benchmarking of different LLM APIs to see how they compare. The vast gulf in pricing between GPT-4o and GPT-4o mini would usually indicate a speed gulf too, but it is oddly missing... and the data actually indicates the smaller model is slower.
I didn't search for any similar benchmarks today, but I have searched in the past, and I've never been able to find any good reference for the tok/s that people are getting out of different hosted models. I hope other people will find this data valuable.
I didn't search for any similar benchmarks today, but I have searched in the past, and I've never been able to find any good reference for the tok/s that people are getting out of different hosted models. I hope other people will find this data valuable.