Hacker News new | past | comments | ask | show | jobs | submit login

I like this bit:

> Personally, when I want to get a sense of capability improvements in the future, I'm going to be looking almost exclusively at benchmarks like Claude Plays Pokemon.

Definitely interested to see how the best models from Anthropics competitors do at this.,




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: