> Personally, when I want to get a sense of capability improvements in the future, I'm going to be looking almost exclusively at benchmarks like Claude Plays Pokemon.
Definitely interested to see how the best models from Anthropics competitors do at this.,
> Personally, when I want to get a sense of capability improvements in the future, I'm going to be looking almost exclusively at benchmarks like Claude Plays Pokemon.
Definitely interested to see how the best models from Anthropics competitors do at this.,