Point remains though, they crushed the benchmark using a specialized model that ...

throwaway314155 · 2025-06-11T13:42:27 1749649347

They revealed the price points for running those evaluations. IIRC the "high" level of reasoning cost tens of thousands of dollars if not more. I don't think they really inflated expectations. In fact a lot of what we learned is that ARC-AGI probably isn't a very good AGI evaluation (it claims to not be one, but the name suggests otherwise).