> The creation of a model which is "co-state-of-the-art" (assuming it wasn't tra...

bccdee · 2025-02-21T23:47:40 1740181660

The leader, perhaps, but not by a large margin, and only on these sample benchmarks. "Co-state-of-the-art" is the term used in the article, and I'm going to take that at face value.

starspangled · 2025-02-23T01:06:55 1740272815

It significantly outperformed competitors on those benchmarks. Around as much as the deltas between some others, which are considered significant.

bccdee · 2025-02-23T21:09:48 1740344988

The deltas between the others are mostly not significant either. They're all about equally good. There's no categorical difference between GPT-4 and Claude 3.5.

starspangled · 2025-02-24T00:16:53 1740356213

That's not true.

bccdee · 2025-02-26T02:31:20 1740537080

Okay what's the categorical difference? Which meaningful category includes one but not the other?