> A year after GPT-4 set the bar, it's still the best model
Debatable. Many people find Claude Opus superior, and I know I've found it consistently better for challenging coding questions. More importantly, the delta between GPT-4 and everything else is getting smaller and smaller. Llama 3 is basically interchangeable with GPT-4 for a huge number of tasks, despite its smaller size.
Many more do not, according to the LMSYS leaderboard.
> Llama 3 is basically interchangeable with GPT-4 for a huge number of tasks
Sure. I am sure the number approaches infinity, if you are willing to let the model inform the task. That's usually not what most people are looking for in a tool.
While they still call it GPT-4, the one topping the rankings are newer iterations of it despite still retaining the same name. The latest one is from 2024-04-09. Sure that one probably finished training a few months ago but it is by no means a 2 year head start.
Even more important, you know that GPT-4 will probably also not crack it. Which is why the SOTA is not terribly interesting. The delta between GPT-4 and the competition has been closing but why anyone would assume that this is a trend and that it would continue with GPT-4.5 to competition, or GPT-5 to competition instead of the other way around is a mystery to me.
I am not saying it could not be true. But extrapolating from differences between current bad models to a future with better models is weird, specially when everyone seems to pretty much agree that scale is the difference between the two and scale is hard and exclusive.
There’s a scatterplot that’s been circulating on Twitter. The trend lines show that since the time of GPT-2, open weights models have improved at a steeper rate than proprietary models, with the two on a path to intersect.
I would argue that's to be expected after the first generally accepted POC (GPT-3.5) was released, with it an entire industry created, and other companies actually started copying/competing in a big way.
It seems a stretch to read this as a continuing trend, when (from what I gather everyone agrees on) the way to better models seems to be ever more efficient handling of ever larger amounts of money, compute and data, with no reasonable limits in sight on any of the three.
Scaling up LLMs is only going to go so far, and it will yield diminishing marginal returns on all of that money, compute, and data. It’s a regime of exponential increases in inputs for linear gains in the outputs - barring some technological breakthroughs which could come from anywhere, not just from OpenAI.
Debatable. Many people find Claude Opus superior, and I know I've found it consistently better for challenging coding questions. More importantly, the delta between GPT-4 and everything else is getting smaller and smaller. Llama 3 is basically interchangeable with GPT-4 for a huge number of tasks, despite its smaller size.