It would be really cool if there was awebsite "we there yet" for reasonable offl...

robbomacrae · 2025-03-05T19:33:32 1741203212

There seems to be a LOT of interest in such a site in the comments here. There seem to be multiple IP issues with sharing your code repo with an online service so I feel a lot of folks are waiting for the hardware to make this possible.

We need a SWE-bench for open source LLM's and for each model to have 3Dmark like benchmarks on various hardware setups.

I did find this which seems very helpful but is missing the latest models and hardware options. https://kamilstanuch.github.io/LLM-token-generation-simulato...

FloatArtifact · 2025-03-06T17:03:24 1741280604

Looks like he bases the benchmarks off of https://github.com/ggml-org/llama.cpp/discussions/4167

I get why he calls it a simulator, as it can simulate token output. It's an important aspect for evaluating use case if you need to get a sense of how much token output is relevant beyond the simple tokens per second text.