Benchmarks scores aren't good because they apply to previous generations of LLMs. That 2.23% uptick can actually represent a world of difference in subjective tests and definitely be worth the investment.
Progress is not slowing down but it gets harder to quantify.
Progress is not slowing down but it gets harder to quantify.