Hacker News new | past | comments | ask | show | jobs | submit login

It would be really cool if there was awebsite "we there yet" for reasonable offline AI.

It could track different hardware configurations and reasonably standardized benchmark performance per model. I know there's benchmarks buried in github Llama repository.




There seems to be a LOT of interest in such a site in the comments here. There seem to be multiple IP issues with sharing your code repo with an online service so I feel a lot of folks are waiting for the hardware to make this possible.

We need a SWE-bench for open source LLM's and for each model to have 3Dmark like benchmarks on various hardware setups.

I did find this which seems very helpful but is missing the latest models and hardware options. https://kamilstanuch.github.io/LLM-token-generation-simulato...


Looks like he bases the benchmarks off of https://github.com/ggml-org/llama.cpp/discussions/4167

I get why he calls it a simulator, as it can simulate token output. It's an important aspect for evaluating use case if you need to get a sense of how much token output is relevant beyond the simple tokens per second text.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: