Hacker Newsnew | past | comments | ask | show | jobs | submit | 0x696C6961's commentslogin

I find that the code quality LLMs output is pretty bad. I end up going through so many iterations that it ends up being faster to do it myself. What I find agents actually useful for is doing large scale mechanical refractors. Instead of trying to figure out the perfect vim macro or AST rewrite script, I'll throw an agent at it.

I disagree strongly at this point. The code is generally good if the prompt was reasonable at this point but also every test possible is now being written, every ui element has the all required traits, every function has the correct documentation attached, the million little refactors to improve the codebase are being done, etc.

Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.


> The code is generally good if the prompt was reasonable

The point is writing that prompt takes longer than writing the code.

> Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that

Yeah, it's great for doing all of those little things. It's bad at doing the big things.


Have to disagree with this too - ask an LLM to architect a project, or propose a cleaner solution and usually does a good job.

Where it still sucks is doing both at once. Thus the shift to integrating "to do" lists in Cursor. My flow has shifted to "design this feature" then "continue to implement" 10 times in a row with code review between each step.


> The point is writing that prompt takes longer than writing the code.

Luckily we can reuse system prompts :) Mine usually contains something like https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313... + project-specific instructions, which is reused across sessions.

Currently, it does not take the same amount of time to prompt as if I was to write the code.


> The code is generally good if the prompt was reasonable at this point

Which, again, is 100% unverifiable and cannot be generalized. As described in the article.

How do I know this? Because, as I said in the article, I use these tools daily.

And "prompt was reasonable" is a yet another magical incantation that may or may not work. Here's my experience: https://news.ycombinator.com/item?id=44470144


> I find that the code quality LLMs output is pretty bad.

That was my experience with Cursor, but Claude Code is a different world. What specific product/models brought you to this generalization?


Claude Code depending on weather, phase of the moon, and compute availability at a specific point in time: https://news.ycombinator.com/item?id=44470144

What sort of mechanical refactors?

"Find all places this API is used and rewrite it using these other APIs."

What is the point?

The point is LLMs are fundamentally unreliable algorithms for generating plausible text, and as such entirely unsuitable for this task. "But the recipe is probably delicious anyway" is beside the point, when it completely corrupted the meaning of the original. Which is annoying when it's a recipe but potentially very damaging when it's something else.

Techies seem to pretend this doesn't happen, and the general public who doesn't understand will trust the aforementioned techies. So what we see is these tools being used en masse and uncritically for purposes to which they are unsuited. I don't think this is good.


I’m someone else but for me the point is a serious bug resulted _incorrect data_, making it impossible to trust the output.

Assuming you are responding in good faith - the author politely acknowledged the bug (despite the snark in the comment they responded to), explained what happened and fixed it. I'm not sure what more I could expect here? Bugs are inevitable, I think it's how they are handled that drives trust for me.

The description includes an input and output json schema.


You're not looking at the latest version. They added output schemas.

Thank you!

Discussion can keep happening after the commit is created.

In my experience this is the same group who is actually fixing things.


It's called humor.


I always tell agents to use ripgrep instead of find.


Backoff APIs often take a retry count as the input. Your suggestion would require changing the API which isn't always practical.


Ever since Github added native support for mermaid, that's all I really use.


Mermaid is the Javascript of charting languages - we use not because it's good but because it's there.


Yeah, just treat it like a slightly more capable dependabot.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: