Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious whether anyone's actually using Claude code successfully. I tried it on release and found it negative value for tasks other than spinning up generic web projects. For existing codebases of even a moderate size, it burns through cash to write code that is always slightly wrong and requires more tuning than writing it myself.


Absolutely stellar for 0-to-1-oriented frontend-related tasks, less so but still quite useful for isolated features in backends. For larger changes or smaller changes in large/more interconnected codebases, refactors, test-run-fix-loops, and similar, it has mostly provided negative value for me unfortunately. I keep wondering if it's a me problem. It would probably do much better if I wrote very lengthy prompts to micromanage little details, but I've found that to be a surprisingly draining activity, so I prefer to give it a shot with a more generic prompt and either let it run or give up, depending on which direction it takes.


Yes. For small apps, as well distributed systems.

You have to puppeteer it and build a meta context/tasking management system. I spend a lot of time setting Claude code up for success. I usually start with Gemini for creating context, development plans, and project tasking outlines (I can feed large portions of codebase to Gemini and rely on its strategy). I’ve even put entire library docsites in my repos for Claude code to use - but today they announced web search.

They also have todos built in which make the above even more powerful.

The end result is insane productivity - I think the only metric I have is something like 15-20k lines of code for a recent distributed processing system from scratch over 5 days.


Is that final number really that crazy? With a well defined goal, you can put out 5-8K per day by writing code the old fashioned way. Also would love to see the code, since in my experience (I use Cursor as a daily driver), AI bloats code by 50% or more with unnecessary comments and whitespace especially when making full classes/files.

> I spend a lot of time setting Claude code up for success.

Normally I wouldn't post this because it's not constructive, but this piece stuck out to me and had me wondering if it's worth the trade-off. Not to mention programmers have spent decades fighting against LoC as a metric, so let's not start using it now!


You'll never see the code. They will just say how amazingly awesome it is, how it will fundamentally alter how coding is done, etc... and then nothing. Then if you look into who posts it, they work in some AI related startup and aren't even a coder.


Not open source but depending on certain context i can show whoever. im not hard to find.

Ive done just about everything across the full & distributed stack. So I'm down to jam on my code/systems and how I instruct & rely on (confidently) AI to help build them.


5k likes of code a day is 10 lines of code a minute solidly for 8 hours straight. Whatever way you cut that with white space, bracket alignment, that’s a pretty serious amount of code to chunk out.


If I am writing Go, it is easy to generate enough if/else and error checks. When working in java, basic code can bloat to big LoC over several hours(first draft, which is obviously cleaned up later before going to PR). React and other FE frameworks also tend to require huge LoC count(mostly boilerplate and auto completed rather than thoughtfully planned and written). It is not that serious amount as you may think.


Nitpicking like this must be fair, if you look at typical AI code - styles, extra newlines, comments, tests/fixtures, etc. it is the same. And again LoC isn't a good measurement in the first place.

Not all my 5k lines are hand-written or even more than a character; a line can be a closing bracket etc. which autocomplete has handled for the last 20 years. It's definitely an achievement, which is why it's important to get clarity when folks claim to reach peak developer productivity with some new tools. To quote the curl devs, "AI slop" isn't worth nearly the same as thoughtful code right now.


are people really committing 5k lines a day without AI assistance even once a month?

I don't think I've ever done this or worked with anyone who had this type of output.


It depends upon how well mapped out the problem is in your head. If it's an unfamiliar domain, no way.


Maybe if you are copy-pasting some html templates, but then it is not “writing code”. Handwriting complex logic, at 5k sloc per day, no way.


Nobody is writing 5k consistently on a daily basis. Sure if it’s a bunch of boiler scaffolding maybe.

I daily drive cursor and I have rules to limit comments. I get comments on complex lines and that’s it.


I'd be really interested in seeing the source for this, if it's an open-source project, along with the prompts and some examples. Or other source/prompt examples you know of.

A lot of people seem to have these magic incantations that somehow make LLMs work really well, at the level marketing and investor hype says they do. However, I rarely see that in the real world. I'm not saying this is true for you, but absent vaguely replicable examples that aren't just basic webshit, I find it super hard to believe they're actually this capable.


While not directly what you're asking for, I find this link extremely fascinating - https://aider.chat/HISTORY.html

For context, this is aider tracking aider's code written by an LLM. Of course there's still a human in the loop, but the stats look really cool. It's the first time I've seen such a product work on itself and tracking the results.


Not open source but depending on certain context i can show you. im not hard to find.


Aider writes 70-80% of its own code: https://aider.chat/HISTORY.html


Can you share more about what you mean by a meta context/tasking management system? I’m always curious when I see people who have happily spent large amounts on api tokens.


Here is some insight... I had gemini obfuscate my business context so if something sounds weird it is probably because of that.

https://gist.github.com/backnotprop/4a07a7e8fdd76cbe054761b9...

The framework is basically the instructions and my general guidance for updating and ensuring the details of critical information get injected into context. some of those prompts I commented here: https://news.ycombinator.com/item?id=43932858

For most planning I use Gemini. I copy either the entire codebase (if less than ~200k tokens) or select only that parts that matter for the task in large codebases. I built a tool to help me build these prompts, keep the codebase organized well in xml structure. https://github.com/backnotprop/prompt-tower


So i use roo and you have the architecture mode draft out in as much detail as you want, plans, tech stack choices, todos, etc. Switch to orchestration mode to execute the plan, including verifying things are done correctly. It sub tasks out the todos. Tell it to not bother you unless it has a question. Cone back in thirty and see how it’s doing. You can have it commit to a branch per task if you want. Etc etc.


I use it, on a large Clojure/ClojureScript application. And it's good.

The interactions and results are roughly in line with what I'd expect from a junior intern. E.g. don't expect miracles, the answers will sometimes be wrong, the solutions will be naive, and you have to describe what you need done in detail.

The great thing about Claude code is that (as opposed to most other tools) you can start it in a large code base and it will be able to find its way, without me manually "attaching files to context". This is very important, and overlooked in competing solutions.

I tried using aider and plandex, and none of them worked as well. After lots of fiddling I could get mediocre results. Claude Code just works, I can start it up and start DOING THINGS.

It does best with simple repetitive tasks: add another command line option similar to others, add an API interface to functions similar to other examples, etc.

In other words, I'd give it a serious thumbs up: I'd rather work with this than a junior intern, and I have hope for improvement in models in the future.


Here's a very small piece of I code I generated quickly (i.e. <5 min) for a small task (I generated some data and wanted to check the best way to compress it):

https://gist.github.com/rachtsingh/e3d2e2b495d631b736d24b56e...

Is it correct? Sort of; I don't trust the duration benchmark because benchmarking is hard, but the size should be approximately right. It gave me a pretty clear answer to the question I had and did it quickly. I could have done it myself but it would have taken me longer to type it out.

I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).


> I don't use it in large codebases (all agentic tools for me choke quickly), but part of our skillset is taking large problems and breaking them into smaller testable ones, and I give that to the agents. It's not frequent (~1/wk).

Where is the breakpoint here? What number of lines of code or tokens in a codebase when it becomes not worth it?


30 to 40ish in my experience. Current state of the art seems to lack thinking well about programming tasks with a layer of abstraction or zooming out a little bit in terms of what might be required.

I feel like as a programmer I have a meta-design in my head of how something should work, and the code itself is a snapshot of that, and the models currently struggle with this big picture view, and that becomes apparent as they make changes. Entirely willing to believe that Just Add Moar Parameters could fix that (but also entirely willing to believe that there's some kind of current technical dead-end there)


> I don't use it in large codebases (all agentic tools for me choke quickly)

Claude code, too?

I found that it is the only one that does a good job in a large codebase. It seems to be very different from others I've tested (aider, plandex).


Yes. It costs me a few bucks per feature, which is an absolute no-brainer.

If you don't like what it suggests, undo the changes, tweak your prompt and start over. Don't chat with it to fix problems. It gets confused.


Claude Code is the first AI coding tool that actually worked for me on a small established Laravel codebase in production. It builds full stack features for me requiring only minor tweaks and guidance (and starting all over with new prompts). However, after a while I switched to Cursor Agent just because the IDE integration makes the workflow a little more convenient (especially the ability to roll back to previous checkpoints).


Just to throw my experience in, it's been _wildly_ effective.

Example;

I'm wrapping up, right now, an updated fork of the PHP extension `phpredis` because Redis 8 recently was released with support for a new data type, Vector Set but the phpredis extension (which is far more performant that non-extension redis libraries for PHP) doesn't support the new vector-related commands. I forked the extension repo, which is in C (I'm a PHP developer, I had to install CLion for the first time just to work along with CC) and fired up claude code with the initial prompt/task of analyzing the extensions code and documenting the purpose, conventions, and anything that it (claude) felt would benefit the bootstrapping process of future sessions such that whole files wouldn't need to be read into a CLAUDE.md file.

This initially, depending on the size of the codebase, could be "expensive". Being that this is merely a PHP extension and isn't a huge codebase, I was fine letting it just rip through the whole thing however it saw fit - were this a larger codebase I'd take a more measured approach to this initial "indexing" of the codebase.

This results in a file that claude uses like we do a readme.

Next I end this session, start a new one and tell it to review that CLAUDE.md file (I specifically tell it to do this, every single new session start moving forward) and then generate a general overview/plan of what needs to be done in order to implement the new Vector Set related commands so that I can use this custom phpredis extension in my PHP environments. I indicated that I wanted to generate a suite of tests focused on ensuring each command works with all of it's various required and optional parameters and that I wanted to use docker containers for the testing rather than mess up my local dev environment.

$22 in API costs and ~6 hours spent and I have the extension, working, in my local environment with support for all of the commands I want/need to use. (there's still 5 commands that I don't intend to use that I haven't implemented)

Not only would I have certainly never embarked upon trying to extend a C PHP extension, I wouldn't have done so over the course of an evening and morning.

Another example:

Before this redis vector sets thing I used CC to build a python image and text embedding pipeline backed by Redis streams and Celery that consumes tasks pushed to the stream by my Laravel application that currently manages ~120 million unique strings and ~65 million unique images that I've been generating embeddings for. Prior to this I'd spent very little time with Python and zero with anything related to ML. Now I have a performant python service that's portable that I run from my Macbook (M2 Pro) or various GPU-having Windows machines in my home that generate the embeddings on an 'as available' basis, pushing the results back to a redis stream that my Laravel app then consumes and processes.

The results of these embeddings and the similarity-related features that they've brought to the Laravel application are honestly staggering. And while I'm sure I could have spent months stumbling through all of this on my own - I wouldn't have, I don't have that much time for side project curiosities.

Somewhat related - these similarity features have directly resulted in this side project becoming a service people now pay me to use.

On a day to do - the effectiveness is a learned skill. You really need to learn how to work with it in the same way you, as a layperson, wouldn't stroll up to a highly specialized piece of aviation technology and just infer how to use it optimally. I hate to keep parroting "skill issue" but - it's just wild to me how effective these tools are and how there's so many people who don't seem to be able to find any use.

If it's burning through cash, you're not being focused enough with it. If it's writing code that's always slightly wrong, stop it and make adjustments. Those adjustments likely/potentially need to be documented in something like I described above in a long-running document used similarly to a prompt.

From my own experience, I watch the "/settings/logs" route on anthropics website while CC is working once I know that we're getting rather heavy with the context. Once it gets into the 50-60,000 tokens range I either aim to wrap up whatever the current task is, or I understand that things are going to start getting a little wonky into the 80k+ range. It'll keep on working up into the 120-140k tokens or more - but you're likely going to end up with lots of "dumb" stuff happening. You really don't want to be here unless you're _sooooo close_ to getting done what you're trying to. When the context gets too high and you need/want to reset but you're mid task - /compact [add notes here about next steps] and it'll generate a summary that will then be used to bootstrap the next session. (Don't do this more than once, really, as it starts losing a lot of context - just reset the session fully after the first /compact)

If you're constantly running into huge contexts you're not being focused enough. If you can't even work on anything without reading files with thousands of lines - either break up those files somehow or you're going to have to be _really_ specific with the initial prompt and context - which I've done lots of. Say I have a model that belongs to a 10+ year old project that is 6000 lines long and I want to work on a specific method in that model - I'll just tell claude in the initial message/prompt which line that method starts on, ends on and what number of lines from the start of the model it should read (so it can get the namespace, class name, properties, etc) and then let it do it's thing. I'll tell it specifically not to read more than 50 lines of that file at a time when looking for something or reviewing something, or even to stop and ask me to locate a method/usages of things, etc rather than reading whole files into context.

So, again, if it's burning through money - focus your efforts. If you think you can just fire it up and give it a generic task - you're going to burn money and get either complete junk, or something that might technically work but is hideous, at least to you. But, if you're disciplined and try to set or create boundaries and systems that it can adhere to - it does, for the most part.


How do you know what it built is correct if you don’t know C?


They'll find out by the CVEs 1 year later.


Aye, being that I have no idea what I'm doing I certainly won't be generating a PR of this code and am willing to take the risk of there being an issue from my adding (rather straightforwardly) a handful of commands to an already existing and stable extension.


Most interesting ! Would you mind sharing the prompt and the resulting CLAUDE.md file ?

Thx !


This is a great report. Using a claude.md like that is honestly genius.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: