Hacker News new | past | comments | ask | show | jobs | submit login

I think this misses the mark. We know LLMs can learn facts. There are lots of other benchmarks full of facts, and I don't expect that saturation of this benchmark will mean we have AGI.

The missing capabilities of LLMs tend more in the direction of long running tasks, consistency, and solving a lot of tokenization and attention weirdness.

I started a company that makes evals though, so I may be biased.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: