Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find that the code quality LLMs output is pretty bad. I end up going through so many iterations that it ends up being faster to do it myself. What I find agents actually useful for is doing large scale mechanical refractors. Instead of trying to figure out the perfect vim macro or AST rewrite script, I'll throw an agent at it.





I disagree strongly at this point. The code is generally good if the prompt was reasonable at this point but also every test possible is now being written, every ui element has the all required traits, every function has the correct documentation attached, the million little refactors to improve the codebase are being done, etc.

Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.


> The code is generally good if the prompt was reasonable

The point is writing that prompt takes longer than writing the code.

> Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that

Yeah, it's great for doing all of those little things. It's bad at doing the big things.


Have to disagree with this too - ask an LLM to architect a project, or propose a cleaner solution and usually does a good job.

Where it still sucks is doing both at once. Thus the shift to integrating "to do" lists in Cursor. My flow has shifted to "design this feature" then "continue to implement" 10 times in a row with code review between each step.


> The point is writing that prompt takes longer than writing the code.

Luckily we can reuse system prompts :) Mine usually contains something like https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313... + project-specific instructions, which is reused across sessions.

Currently, it does not take the same amount of time to prompt as if I was to write the code.


> The code is generally good if the prompt was reasonable at this point

Which, again, is 100% unverifiable and cannot be generalized. As described in the article.

How do I know this? Because, as I said in the article, I use these tools daily.

And "prompt was reasonable" is a yet another magical incantation that may or may not work. Here's my experience: https://news.ycombinator.com/item?id=44470144


> I find that the code quality LLMs output is pretty bad.

That was my experience with Cursor, but Claude Code is a different world. What specific product/models brought you to this generalization?


Claude Code depending on weather, phase of the moon, and compute availability at a specific point in time: https://news.ycombinator.com/item?id=44470144

What sort of mechanical refactors?

"Find all places this API is used and rewrite it using these other APIs."



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: