More

0x696C6961 · 2025-07-04T22:04:00 1751666640

I find that the code quality LLMs output is pretty bad. I end up going through so many iterations that it ends up being faster to do it myself. What I find agents actually useful for is doing large scale mechanical refractors. Instead of trying to figure out the perfect vim macro or AST rewrite script, I'll throw an agent at it.

AnotherGoodName · 2025-07-04T22:39:02 1751668742

I disagree strongly at this point. The code is generally good if the prompt was reasonable at this point but also every test possible is now being written, every ui element has the all required traits, every function has the correct documentation attached, the million little refactors to improve the codebase are being done, etc.

Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.

0x696C6961 · 2025-07-05T00:30:56 1751675456

> The code is generally good if the prompt was reasonable

The point is writing that prompt takes longer than writing the code.

> Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that

Yeah, it's great for doing all of those little things. It's bad at doing the big things.

lubujackson · 2025-07-05T15:51:11 1751730671

Have to disagree with this too - ask an LLM to architect a project, or propose a cleaner solution and usually does a good job.

Where it still sucks is doing both at once. Thus the shift to integrating "to do" lists in Cursor. My flow has shifted to "design this feature" then "continue to implement" 10 times in a row with code review between each step.

diggan · 2025-07-05T08:25:17 1751703917

> The point is writing that prompt takes longer than writing the code.

Luckily we can reuse system prompts :) Mine usually contains something like https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313... + project-specific instructions, which is reused across sessions.

Currently, it does not take the same amount of time to prompt as if I was to write the code.

troupo · 2025-07-05T05:14:37 1751692477

> The code is generally good if the prompt was reasonable at this point

Which, again, is 100% unverifiable and cannot be generalized. As described in the article.

How do I know this? Because, as I said in the article, I use these tools daily.

And "prompt was reasonable" is a yet another magical incantation that may or may not work. Here's my experience: https://news.ycombinator.com/item?id=44470144

CharlesW · 2025-07-04T22:17:47 1751667467

> I find that the code quality LLMs output is pretty bad.

That was my experience with Cursor, but Claude Code is a different world. What specific product/models brought you to this generalization?

troupo · 2025-07-05T05:16:52 1751692612

Claude Code depending on weather, phase of the moon, and compute availability at a specific point in time: https://news.ycombinator.com/item?id=44470144

the__alchemist · 2025-07-04T22:51:33 1751669493

What sort of mechanical refactors?

0x696C6961 · 2025-07-05T00:27:53 1751675273

"Find all places this API is used and rewrite it using these other APIs."

0x696C6961 · 2025-07-01T23:24:29 1751412269

What is the point?

andrepd · 2025-07-02T14:38:55 1751467135

The point is LLMs are fundamentally unreliable algorithms for generating plausible text, and as such entirely unsuitable for this task. "But the recipe is probably delicious anyway" is beside the point, when it completely corrupted the meaning of the original. Which is annoying when it's a recipe but potentially very damaging when it's something else.

Techies seem to pretend this doesn't happen, and the general public who doesn't understand will trust the aforementioned techies. So what we see is these tools being used en masse and uncritically for purposes to which they are unsuited. I don't think this is good.

plonq · 2025-07-02T02:21:07 1751422867

I’m someone else but for me the point is a serious bug resulted _incorrect data_, making it impossible to trust the output.

bubblyworld · 2025-07-02T06:22:12 1751437332

Assuming you are responding in good faith - the author politely acknowledged the bug (despite the snark in the comment they responded to), explained what happened and fixed it. I'm not sure what more I could expect here? Bugs are inevitable, I think it's how they are handled that drives trust for me.

0x696C6961 · 2025-06-28T14:56:41 1751122601

The description includes an input and output json schema.

jcheng · 2025-06-28T23:08:23 1751152103

Only input, not output.

https://modelcontextprotocol.io/specification/2025-03-26/ser...

0x696C6961 · 2025-07-01T23:28:56 1751412536

You're not looking at the latest version. They added output schemas.

jcheng · 2025-07-02T19:48:42 1751485722

Thank you!

0x696C6961 · 2025-06-22T14:51:43 1750603903

Discussion can keep happening after the commit is created.

0x696C6961 · 2025-06-20T21:33:22 1750455202

In my experience this is the same group who is actually fixing things.

0x696C6961 · 2025-06-14T11:54:42 1749902082

It's called humor.

0x696C6961 · 2025-06-08T10:46:52 1749379612

I always tell agents to use ripgrep instead of find.

0x696C6961 · 2025-05-31T11:35:00 1748691300

Backoff APIs often take a retry count as the input. Your suggestion would require changing the API which isn't always practical.

0x696C6961 · 2025-05-24T10:58:00 1748084280

Ever since Github added native support for mermaid, that's all I really use.

perrygeo · 2025-05-24T14:57:36 1748098656

Mermaid is the Javascript of charting languages - we use not because it's good but because it's there.

0x696C6961 · 2025-05-21T11:38:41 1747827521

Yeah, just treat it like a slightly more capable dependabot.