Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think a lot of it comes down to how well the user understands the problem, because that determines the quality of instructions and feedback given to the LLM.

For instance, I know some people have had success with getting claude to do game development. I have never bothered to learn much of anything about game development, but have been trying to get claude to do the work for me. Unsuccessful. It works for people who understand the problem domain, but not for those who don't. That's my theory.





It works for hard problems when the person already solves it and just needs the grunt work done

It also works for problems that have been solved a thousand times before, which impresses people and makes them think it is actually solving those problems


Which matches what they are. They're first and foremost pattern recognition engines extraordinaire. If they can identify some pattern that's out of whack in your code compared to something in the training data, or a bug that is similar to others that have been fixed in their training set, they can usually thwack those patterns over to your latent space and clean up the residuals. If comparing pattern matching alone, they are superhuman, significantly.

"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape. Their ability to pattern match makes reasoning seem more powerful than it actually is. If your bug is within some reasonable distance of a pattern it has seen in training, reasoning can get it over the final hump. But if your problem is too far removed from what it has seen in its latent space, it's not likely to figure it out by reasoning alone.


Exactly. I go back to a recent ancestor of LLMs, seq2seq. Its purpose was to translate things. Thats all. That needed representation learning and an attention mechanism, and it lead to some really freaky emergent capabilities, but its trained to trainslate language.

And thats exactly what its good for. It works great if you already solve a tough problem and provide it the solution in natural language, because the program is already there, it just needs to translate it to python.

Anything more than that that might emerge from this is going to be unreliable sleight of next-token-prediction at best.

We need a new architectural leap to have these things reason, maybe something that involves reinforcement learning at the token represention level, idk. But scaling the context window and training data arent going to cut it


>"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape.

What do you mean by this? Especially for tasks like coding where there is a deterministic correct or incorrect signal it should be possible to train.


it's meant in the literal sense but with metaphorical hacksaws and duct tape.

Early on, some advanced LLM users noticed they could get better results by forcing insertion of a word like "Wait," or "Hang on," or "Actually," and then running the model for a few more paragraphs. This would increase the chance of a model noticing a mistake it made.

Reasoning is basically this.


It's not just force inserting a word. Reasoning is integrated into the training process of the model.

Not the core foundation model. The foundation model still only predicts the next token in a static way. The reasoning is tacked onto the instructGPT style finetuning step and its done through prompt engineering. Which is the shittiest way a model like this could have been done, and it shows

> It also works for problems that have been solved a thousand times before

So you mean it works on almost all problems?


I mean problems not worth solving, because theyve already been solved. If you need to just do the grunt work of retrieving the solution to a trite and worn out problem from the models training data, then they work great

But if you want to do interesting things, like all the shills keep trying to claim they do. Then this wont do it for you. You have to do it for it


> But if you want to do interesting things, like all the shills keep trying to claim they do

I don't know where this is coming from. I've seen some over-enthusiastic hype for sure, but most of the day-to-day conversations I see aren't people saying they're curing cancer with Claude, they're people saying they're automating their bread and butter tasks with great success.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: