I find that the code quality LLMs output is pretty bad. I end up going through s...

AnotherGoodName · 2025-07-04T22:39:02 1751668742

I disagree strongly at this point. The code is generally good if the prompt was reasonable at this point but also every test possible is now being written, every ui element has the all required traits, every function has the correct documentation attached, the million little refactors to improve the codebase are being done, etc.

Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.

0x696C6961 · 2025-07-05T00:30:56 1751675456

> The code is generally good if the prompt was reasonable

The point is writing that prompt takes longer than writing the code.

> Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that

Yeah, it's great for doing all of those little things. It's bad at doing the big things.

lubujackson · 2025-07-05T15:51:11 1751730671

Have to disagree with this too - ask an LLM to architect a project, or propose a cleaner solution and usually does a good job.

Where it still sucks is doing both at once. Thus the shift to integrating "to do" lists in Cursor. My flow has shifted to "design this feature" then "continue to implement" 10 times in a row with code review between each step.

diggan · 2025-07-05T08:25:17 1751703917

> The point is writing that prompt takes longer than writing the code.

Luckily we can reuse system prompts :) Mine usually contains something like https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313... + project-specific instructions, which is reused across sessions.

Currently, it does not take the same amount of time to prompt as if I was to write the code.

troupo · 2025-07-05T05:14:37 1751692477

> The code is generally good if the prompt was reasonable at this point

Which, again, is 100% unverifiable and cannot be generalized. As described in the article.

How do I know this? Because, as I said in the article, I use these tools daily.

And "prompt was reasonable" is a yet another magical incantation that may or may not work. Here's my experience: https://news.ycombinator.com/item?id=44470144

CharlesW · 2025-07-04T22:17:47 1751667467

> I find that the code quality LLMs output is pretty bad.

That was my experience with Cursor, but Claude Code is a different world. What specific product/models brought you to this generalization?

troupo · 2025-07-05T05:16:52 1751692612

Claude Code depending on weather, phase of the moon, and compute availability at a specific point in time: https://news.ycombinator.com/item?id=44470144

the__alchemist · 2025-07-04T22:51:33 1751669493

What sort of mechanical refactors?

0x696C6961 · 2025-07-05T00:27:53 1751675273

"Find all places this API is used and rewrite it using these other APIs."