Hacker News new | past | comments | ask | show | jobs | submit login

> A year after GPT-4 set the bar, it's still the best model

Debatable. Many people find Claude Opus superior, and I know I've found it consistently better for challenging coding questions. More importantly, the delta between GPT-4 and everything else is getting smaller and smaller. Llama 3 is basically interchangeable with GPT-4 for a huge number of tasks, despite its smaller size.




> Many people find Claude Opus superior

Many more do not, according to the LMSYS leaderboard.

> Llama 3 is basically interchangeable with GPT-4 for a huge number of tasks

Sure. I am sure the number approaches infinity, if you are willing to let the model inform the task. That's usually not what most people are looking for in a tool.


GPT-4 was released in March 2023.

Which means the research that went into it would've been finalised quite some time prior.

Meaning that you're getting close to a 2 year head start.


While they still call it GPT-4, the one topping the rankings are newer iterations of it despite still retaining the same name. The latest one is from 2024-04-09. Sure that one probably finished training a few months ago but it is by no means a 2 year head start.


Agree, the delta is getting smaller. And for majority of the tasks you can use the Claude Sonet which is better than 3.5 and also fast.

But at the same time when you actually want to solve a complicated problem, deep down you know that only GPT 4 can crack it.


Even more important, you know that GPT-4 will probably also not crack it. Which is why the SOTA is not terribly interesting. The delta between GPT-4 and the competition has been closing but why anyone would assume that this is a trend and that it would continue with GPT-4.5 to competition, or GPT-5 to competition instead of the other way around is a mystery to me.

I am not saying it could not be true. But extrapolating from differences between current bad models to a future with better models is weird, specially when everyone seems to pretty much agree that scale is the difference between the two and scale is hard and exclusive.


There’s a scatterplot that’s been circulating on Twitter. The trend lines show that since the time of GPT-2, open weights models have improved at a steeper rate than proprietary models, with the two on a path to intersect.


I would argue that's to be expected after the first generally accepted POC (GPT-3.5) was released, with it an entire industry created, and other companies actually started copying/competing in a big way.

It seems a stretch to read this as a continuing trend, when (from what I gather everyone agrees on) the way to better models seems to be ever more efficient handling of ever larger amounts of money, compute and data, with no reasonable limits in sight on any of the three.


Scaling up LLMs is only going to go so far, and it will yield diminishing marginal returns on all of that money, compute, and data. It’s a regime of exponential increases in inputs for linear gains in the outputs - barring some technological breakthroughs which could come from anywhere, not just from OpenAI.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: