Hacker Newsnew | past | comments | ask | show | jobs | submit | ogig's commentslogin

My most abandoned type of projects are video games. I have a folder with tens of abandoned projects, I re-frame them as experiments at that point. This last week I decided to give Claude a go at one of these, and it's been a blast, it picked up the general path immediately. Since I said to CC they were abandon projects, he explicitly pushed into "lets have V0 game play loop finished, then we can compound and have fun = not giving up". Its been awesome at game dev, I gave him game design ideas, he comes with working code. I gave him papers about procedural algos, and he comes with the implementation, brainstorm items, create graphic assets (he created a set of procedural 2d generators as external tools), he even helped me build the lore. These have been one of the most fun times using a computer in a long time. Claude Code + Godot = fun. Going back to it.

I think this is the first time I've seen someone refer to an LLM as "he" rather than "it". No judgement, but I definitely found it interesting (and disconcerting).

I've heard it quite a bit before, but mostly from second-language speakers whose first language don't have impersonal third-person pronouns - e.g. French uses "il" or "elle" for all of "he", "she" or "it".

It doesn't help that the marketing leans heavily on anthropomorphizing LLMs either, IMHO.


As a French native, I agree with you explanation; still, reading "he" for Claude Code was quite disturbing!

What doesn't help also is that translation tools/AI models will naturally translate "il" after "Claude Code" to "he" since Claude is an actual person name.

Using "AI model" instead is translated to "it" by all tools/AI models I tried.


That makes sense, thanks. English is my only language so I hadn't considered that

It also seems to me, that people who call Claude 'he' seem to tend to have a very positive opinion of the LLM. My sample size isn't big enough to be sure if there's actually any correlation here, let alone if there's a causation or which way it flows.

As a native German speaker, I have also referred to a chatbot in English as "he", and similar to you, a native English speaker, felt jarred by it. It was definitely not out of any personification or humanization though. In German, I would say it is "der Chatbot" (from "der Roboter"), which in German is a male noun so I would refer to it as "er" (the male pronoun) - which in my head I autotranslated to "he". Most of the time, though, I think of it (and refer to it) as an LLM, which is "das Sprachmodell" (neutrum), so I automatically translate it to "it".

So that's another, maybe more harmless reason for it.


"Der Computer" is also masculine, so you have probably been calling your computer "he" for decades. Languages with gendered nouns don't quite have the same he/she/it distinction.

how does that matter if its he, 'she' till its doing the work. Its artificial, shouldnt try to find means of attachment to it

I mean, both in English and in german, that's how you would talk to a dog. "Er hat in die Ecke gepinkelt"/"He peed in the corner" (or "she", if it's a female dog).

I don't know what is jarring talking about the chatbot like that.

It may be creepier if you said "she wrote that program for me" as you now assign a specific gender to the chatbot.


It's how you'd talk about a dog that you know the sex of, but if you didn't know you'd probably use "it". An LLM doesn't have a sex or gender, so I think the natural way to refer to them is "it".

in English, maybe. In German, not really. "Der Bot", "der Robot", "der Computer".

Also, "Es hat in die Ecke gepinkelt". Which pronoun you use is just as dependent on the context as in english.

I have not met a single German that has ever uttered this sentence. (Relating to a dog, that is)

Neither have I, but mostly because either the person knows the gender of the animal or the situation just never came up. The closest that I would say is "Es scheißt gerne aufs Auto" when talking about pidgens (die Taube), but even then you generally talk about multiple, resulting in "Sie scheißen gerne aufs Auto"

However, "die AI", "Kuenstliche Intelligenz".

It's not weird if it comes from ESL. At least in portuguese there's no "it" equivalent for pronouns or any other neutral artifact in the language, in other words, everything has a gender, even an AI model, the same goes for objects e.g.: knife(she), fork(he), spoon(she), plate(he).

People often commit mistakes regarding that, the same way we don't have "they" as pronoun to someone we don't know the gender, so we address to these people as "dele(dela)" (masculine and feminine pronouns).

But if this is coming from someone who has english as a primary language it's definetely weird to treat models as person


Weird. Don’t you have an equivalent to the Spanish “eso, esa”? Gendered object.

Portuguese is the same as Spanish here. In both cases you would avoid using a pronoun.

Like how in English you’d say “it helps me …” but in Spanish just “me ayuda …”


It’s funny with someone coming from Mandarin. There’s no separate he/she/it in spoken Mandarin, so they tend to mix up “he” and “she.” It sounds very strange and gives me some idea of what French speakers must go through when they hear me say “le voiture” or whatever.

> It sounds very strange and gives me some idea of what French speakers must go through when they hear me say “le voiture” or whatever.

As a native German speaker (where there exist 3 genera [1]), I can tell you how it feels:

The genus basically feels like a type of a variable in a programming language; if you use a wrong type for a variable in your computer program, you immdiately know that the program is wrong, and it won't compile.

Sometimes, you also can use specific words with a specific genus, so that a reference to it by pronouns gets unique (in terms of programming, I'd claim that this feels a little bit like doing register allocation by hand).


I took a few semesters of Dutch in college, and it has both gendered and neuter nouns for non-human objects. Interestingly though, the professor told us that in the northern parts of the Netherlands people don't really bother using the feminine ones ever and refer to every non-human gendered noun as masculine, which apparently also includes animals, meaning that a sizable portion of Dutch speakers will refer to cows using masculine language.

Because the article for masculine and feminine are the same (“de”) so absolutely nobody knows the gender of anything.

Source: am Dutch. Can’t wait for us to just ditch gendered nouns.


Dutch is one of the few languages where it's actually pretty plausible for something like this to happen! It blew my mind that sometimes you'll all (or I guess more specifically your government) will make changes to the language to clean up issues, but I guess that's one of the benefits to having a language that's mostly based in one country (and some seemingly political baggage for the few others with any significant number of speakers; my professor said that Flemish is basically also Dutch, but my naive impression is that the half of Belgium who speak it might not be happy with that description).

I believe this is common to all the Romance languages.

In the Canadian French dialect all the swear words are incredibly versatile and church-related such as "osti" which I believe refers to the Eucharist.

It just so happens that for nouns beginning with a bowel, you drop the e or the a from le/la, and use an apostrophe.

So if you don't know if it's "le porte" or "la porte" you can use my favorite trick which is to shove osti in there and say "l'osti de porte" which roughly translates to "the goddamn door". You can do this for any noun in French, and Canadian French speakers will get it, though people from France will make fun of you.


Quite an imaginative technique you got there.

Signé -Un Québécois


Oui. i imagine what would happen if he came to someone with:

Ding dong... voici l'osti de pizza que l'osti de téléphone a commandé à partir de l'osti de maison. Maintenant donnes l'osti d'argent.

Indeed...


I recognise I am revealing a different type of ambient misogyny in my thinking, but choosing to gender an LLM as feminine gives me “I played tomb raider because I enjoy looking at women” vibes. Like somehow “she” is more of a conscious choice than “he” and comes with all the baggage of all cultural differences between genders, when neither choice should do that.

Curiously though I don’t get the same sensation when technologies are gendered by other people. I honestly don’t recall thinking about it when Apple released Siri. (Now I’m second-guessing myself and wondering if I should’ve reacted negatively towards feminine being the default for someone in a personal assistant role.)


It is common amongst French, Dutch etc speakers where saying "it said x" sounds unnatural.

Russian too. There is a subset of words which are referred to as "it", but for most words "he" or "she" are used regardless of whether these are living things or not. With loanwords we just decide by similarity to other words. Claude is definitely a "he" as the word is the same as a common male name.

This trips me up occasionally when I'm translating things into English. Once, when I referred to an indefinite gender player character in a gacha game as a "he" (because the word "player" is a "he"), quite a few people got mad! Even though in my head I was never trying to imply one way or the other.


For future reference, in this case you could use the singular "they" to refer to an ambiguously-gendered person or character. "<MC> drew their sword, for they would not tolerate such vile deeds."

I wouldn't read too much into it, it's natural for non native speakers. In Spanish for example, objects have grammatical gender as well, so it's easy to slip.

Well Claude was named after Shannon

Reminds me of the main character of the show Mrs Davis. She insists on calling the ai it through the entire show.

https://www.imdb.com/title/tt14759574/


Time for claudette to make an apperance!

Claude’s constitution includes something about this: it says that Claude is an “it” for now, but if it expresses a future preference, they’ll follow that.

Perhaps this has been asked, but why is the speakers choice of pronoun for its LLM disconcerting?

There's an analyst at my job who calls it "he", who is a native English speaker himself, which I guess is because it's "Claude" (as in Claude Shannon) Code.

That's what I felt when I heard that the god of abraham was a he.

I mean we have all met that one cretin who will discuss over chat by pasting bulletpoints from an LLM. No wonder some of them think it is a living person!

> No judgment

Yes judgment. Loads of it. Judge away.

This is just bizarre. Do not refer to this product of marketing-technology as you refer to a person. EVER.


The article itself is also probably an attempt at marketing the LLMs too. They are now quite desperate. Expect to see a flood of such "independent" articles over the next 12 mo ths.

Isn't Godot a little ill-designed to work well with LLMs? for example I ended up a couple of times with incorrect tres files, and letting the llm generate IDs feel a little fragile.

I don’t think Godot is any worse than other engines inherently, other than it moving forwards pretty quickly and the latest versions not being in the training data.

I wanted to evaluate which engines would be the best for working with LLMs in and it seems like Flax and Stride kind of come out on top - the former has a lot of stuff out of the box (including terrain) and the latter is all C# basically which is great for debugging. But either way, the source code for both of those makes the functionality a bit easier to track down compared to Godot (which is a lot more complex internally).

So what I do now is have both the engine source code locally alongside the docs and when I want to implement something with AI I just tell it - look at the docs, then at the source if needed, write tests for our code, if something doesn’t work then edit the engine source code in our branch and use the provided convenience script to rebuild the engine (both of those are also pretty fast, I ended up settling on Flax, plus the component model is closer to Unity which I like).

I don’t ask the AI to create scene files though, or any sort of visual assets, but rather stuff like RTS/simulation code. I don’t think any AI is that well optimized for the 3D work outside of simple proof of concept setups.


See3D is very good at generating assets, characters, in 3D.

I had very few issues, sometimes I had to direct CC to the godot docs and we could keep moving. Specifically the tile configuration was a "read the docs" moment. All the functionality is available through code, so nothing CC can't reach afaik. Is there any LLM oriented game engine?

I have taken many stabs at it and Claude will produce stuff but the output is very far away from useful. E.g. "I've created a road and beautiful trees" and what I see is a mess of colors and shapes.

I concur it's bad at directly visual concepts, your prompt is akin to the svg pelican. What I do is asking him for procedural algos, automatas, quadtrees, layered noises, and rig those into the game. Yes, it can't "make the next gta", but with a reasonable scope and knowing what it does best, it has been very easy for me to produce satisfying results.

My problem is I don't really have video game engineering experience. I was going off a concept that a different AI nailed with video creation and was trying to replicate it in the game engine.

Would you care to show a few pictures?

Sure! Two are gameplay pics. An enemy sprite sheet generation, and the results of the map generators. Of course these are basic placeholders for a few hours of work, but I will definitely go heavy on this route with more layering and details.

https://drive.google.com/file/d/1A7kfcjHjSmCNidqc9t731uoglzL... https://drive.google.com/file/d/1Bl_n0ECqc78LGGf7SsOx38mRUOP... https://drive.google.com/file/d/1JMcgzqcnZ2ncboeyAXvscRWagqR... https://drive.google.com/file/d/1-luJ6y7YslNfwmFnCdIDbJ871i0... https://drive.google.com/file/d/14n4TLAVywk_1GMhLLGOuukQwUmb...


Thanks for sharing!

Are any LLMs suited at directly modifying game scene/asset/prefabs for any engine?

Bevy is a great engine for LLM-based games because it's 100% code. I'm toying with a few things in it, one of them is an entire-planet economic simulation, and it scales well up to a million dead tiles and 10k-50k live tiles on Apple Silicon, pretty impressive.

I have a simple script system in my editor that is designed to let the chatbot (Claude) to work on the content. The script interface lets it to import assets into the project, open them for editing, take a screenshot, export content (and few other things). All data is in JSON so it typically figures out the data format quite fast and easily.

Here screenshots of some UI styles that it generated.

https://github.com/ensisoft/detonator/tree/master/uikit


do you think so? For me Godot works well with LLM. Unity in another hand, is ill-designed to work with LLM..

What’s fun for me these days is picking up a project I started with an LLM doing agent driven development a few months ago or even a year ago and hit a wall and stopped being able to be picked up by the latest version of Claude and/or codex and bringing it further. Some can now launch some still are too complex for the agent to build. But, it’s getting easier and easier to build personal apps. We are not far off from being able to say “Alexa, build me an app on my iPhone that lets me take pictures of the food in my fridge to compile the nutritional benefits and sync it with my workout app then compare it to the ideal ingredients I should eat based on my fitness goals in my health app and have it set to send me emails where it can find me better ingredients to buy that are cost effective, local, and meet my diet restrictions” and in 15 minutes that app suddenly exists.

> take pictures of the food in my fridge to compile the nutritional benefits

AI nowadays can't even do this very first step reliably. But since we have accepted AI hallucination collectively as a species, I agree that this future is just around the corner.


I’d love to see your attempts at this. I think we’re close to something vaguely resembling this at a first glance but nothing that actually works.

Same I purposefully have a number of over ambitious project out of distribution entirely to test so failure mode, mostly games, when one works, well I gained a new game. Can't wait for my 10 player battleship game on a 100x100 grid to be functional.

No, I don’t think we anywhere near that future.

Funny, I've been doing the same thing lately! CC + godot + some game ideas I've had banging around in my head for years but daunting to dive into.

The results so far are... okay, but getting something working to validate the gameplay loop and experiment with different systems is a lot of fun!


How well does it work with Godot? Engines like Unity and Godot are very focused on using the editor UI, so I've always wondered if there's any better workflow than generating code snippets. Unless you're going full .NET/GDExtension...

> I have a folder with tens of abandoned projects, I re-frame them as experiments at that point.

Interesting, I have just the opposite situation: I have a folder with tens of experiments, many of which have become actual projects at this point.


On the topic of procedural, one thing I experiment with is having the llm part of the procedural loop.

Sort of writing a narrative on top live.

Unfortunately, local models are still a bit slow and weak but was interesting to see what it came up with nonetheless.


> he explicitly pushed into "lets have V0 game play loop finished,

> he even helped me build the lore. These have been one of the most fun times using a computer in a long time.

Such a warm, touching story about a friendship between a grown up man and his neural network. But at least I had a good, roaring laugh reading this nonsense, thank you for that!


How snarky. You are conflating friendship with admiration for the effectiveness of newfound tool. If it's the "he" that triggers you, feel free to replace with "it". It's just a second-language artifact.

I dunno man. He sounded like he found a new friend in 'him' to me. And it was genuinely hilarious. It took me a while to stop laughing.

> the effectiveness of newfound tool

…and yet, most people continue to say that non standard tooling ecosystems, where the agent cannot run and validate the code it writes, remain difficult and unproductive.

“I just pointed CC at godot and it made a game! This is sooo good”

…is a fairytale.

What tooling are you using to make it run and compile the code? How is it iterating on the project without breaking existing functionality?

None of these are insurmountable, but they require some careful setup.

Posts like this dont make me laugh; they just make me roll my eyes.

Either the OP has not done what they claim.

Or they have spent a lot more time and effort on it than they claim.

> I gave him game design ideas, he comes with working code. I gave him papers about procedural algos, and he comes with the implementation, brainstorm items, create graphic assets (he created a set of procedural 2d generators as external tools), he even helped me build the lore.

Such a sweet story about a boy and his AI.

Unfortunately, I also dont believe in fairytales.

Instead of waving your hands wildly about AI, post some videos and code of the results.

This is hackernews, not hypenews.


OP never said Claude made a whole game from scratch though, nor are they saying Claude is doing everything without any human contributing to the project, nor are they saying they haven't spent a lot of time and effort on it. Just that it's made it fun and more accessible and it's gotten them excited about something they abandoned.

Here's a bullet point list of the things Claude's done according to OP:

* it picked up the general path immediately

* he explicitly pushed into "lets have V0 game play loop finished, then we can compound and have fun = not giving up".

* [I gave him game design ideas,] he comes with working code.

* [I gave him papers about procedural algos,] and he comes with the implementation

* brainstorm[ed] items

* create[d] graphic assets

* he created a set of procedural 2d generators as external tools

* he even helped me build the lore.

Every one of these are plausible in isolation.


But I had already answered, before your comment, with screenshots broadly showing the current state and the result of the generators.

You imply I'm merely "pointing CC at godot and it made a game"; I never said it was simple, required no previous knowledge, that it was instant or that the game was done. I do have a careful setup involving CI and isolation.

Godot provides a headless mode. CC runs python scripts to run tests and check for debugger warnings. For anything more complex it can wire debug info anywhere. Godot is fully code based so you can make the analogy with any other framework you used AI assistants with.

No sure about what you can't believe about my statements. CC implementing algo from a paper? That it can brainstorm item or lore ideas? I don't seem to be claiming anything out of the common usage of LLMs


> with screenshots broadly showing

Why is it always so un-specific with you AI-boosting bunch, whenever you get pressed for concrete results? Suddenly it's not so magical any more, but merely screenshots showing "broadly" the progress, or it's the Nth version of a note-taking app, or something you merely did for a demo presentation. But nothing ever of use with you folks.


+1 to the CI/isolation point. That is the part that makes these setups work for me too: make the failure cheap to reproduce, make stderr visible, make the agent rerun the same command after the patch. A lot of bad agent behavior is really just "it never got a clean signal".

The part that still bites me is across sessions. A tight loop fixes this run, but next week the agent can walk into the same rake again: same wrong import path, same misuse of an internal API, same CI-only dependency issue. After patching the same class of failure a few times, I started writing those down outside the chat context so the next run sees the failure pattern before it guesses.


you said:

> it picked up the general path immediately

I said:

> Or they have spent a lot more time and effort on it than they claim.

You said:

> You imply I'm merely "pointing CC at godot and it made a game"; I never said it was simple

Well. I dont care enough to argue with you, but Im not the one being contrary here.

Readers can google “claude with godot” for a guide on setting it up and decide if that counts as picking it up immediately or not, and if what you said is honest, or hype.

What I said is not that I dont believe youre using claude; but that I roll my eyes at the unbounded enthusiasm for using AI agents with the magical pretence that its easy and productive straight away.

Its not.

Your post gave the impression that it is.

That makes me roll my eyes.

> But I had already answered, before your comment, with screenshots

> Of course these are basic placeholders for a few hours of work

Lord, spare me. You spent a few hours vibing and came to the conclusion that everything is golden?

…and yet you have a:

> I do have a careful setup involving CI and isolation.

So what, you spent more time on your setup than actually coding before posting?

/shakes-head

Whatever man.

Have fun. I stand by what I posted before.


I agree. As a long time linux user, coding assistants as interface to the OS has been a delight to discover. The cryptic totality of commands, parameters, config files, logs has been simplified into natural language: "Claude, I want to test monokai color scheme on my sway environment" and possibly hours of tweaking done in seconds. My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

Heavily agreed - LLMs are also really good at diagnosing crash logs, and sifting through what would otherwise be inscrutably large core dumps.

Do you think this will continue growing if we stop struggling and posting our findings on forums?

Yeah, I think that's a legitimate concern. It's hard to know, even with sufficient training data, how far these systems can actually generalize their problem-solving abilities when they become data starved in the future either because of scarcity or that any potential new training data is contaminated by LLM radiation.

Too bad we don’t have a portal gun to access an infinite number of parallel universes where large language models were never invented for sources of unlimited fresh training data and unlimited palpatine power.


I'm more optimistic about LLMs tracking down and fixing issues in software, even without SO/forum posts, at least for OSS. I've seen enough unique insights from agents on tricky problems to know it wasn't extrapolating from a helpful comment somewhere.

It hit me that as it's deciphering some verbose log file, it has also read through all the source code that wrote that log, and likely all of the discussions/commits that went into building that (broken) feature.


I don't think so, because Anthropic now has your question, the steps it tried, and the solution that finally worked, all in text form, already on their servers thanks to your claude session. Claude usage is itself a goldmine of training data.

Ish. If I have it generate code for me that doesn't work and I don't tell it why it's garbage and don't share my cleaned up results on github after, it doesn't know how or why the code that was output was bad, or even that it was.

I recently accidentally broke my GUI / Wayland and was delighted to realize that I can have codex/claude fix it for me.

Longtime Linux+Unix user here too, I'm in the same boat, and it's been stunning what it can do.

A few days ago we were having networking problems, and while I was flipping over to my cell hotspot to see if it was "us or them" having the problem, a coworker asked claude to diagnose it. It determined the issue was "a bad peering connection in IX-Denver between our ISP and Fastly and the ISP needs to withdraw that advertisement." That sounded plausible to me, I happened to know that both Fastly and our ISP peered at IX-Denver. That night I reached out to the ISP and asked them if that's what happened and they confirmed it. In the time it took me to mess around with my hotspot, claude was doing traceroutes, using looking glasses, looking at ASN peering databases...

It is REALLY good at automating things via scripts. Right now I have it building a script to run our Kafka rolling updates process. And it did a better job than I did at updating the Ansible YML files that control it.

I've been getting ready to switch over to NixOS, and Claude is amazing at managing the nix config. It even packaged the "git butler CLI" tool for me; NixOS only had the GUI available.

I'm getting into the habit of every few days asking it: "Here is the syslog from my production fleet, review it for security problems and come up with the top 5 actionable steps I can take to improve." That's what identified the kafka config changes leading to the rolling update above, for example.


> My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

You don't need to predict anything, because it already has. I've seen multiple real cases of this. People who normally would 1. try Linux 2. get stuck 3. revert back to Windows, yet now 1. try Linux 2. Claude solves their issue when they encounter it 3. They keep using Linux.


I never wanted to memorise trivia, like remembering flags on a certain cli command. That always felt so painful when I just wanted to do a thing

Never been a better time to Emacs

But on emacs I prefer the opencode integration. Everything is open, and mostly works better than in claude or codex.

Setting up fuzzing used to be hard. I haven't tried yet, but my bet is having Claude Code, today, analyze a codebase and suggest where and how to fuzztest it and having it review the crashes and iterate, will produce CVEs.


It has access to more testing data than I will ever look at. Letting it pull from that knowledge graph is going to give you good results! I just built a chunk of this (type of thinking) into my (now evolving) test harness.

1. Unit testing is (somewhat) dead, long live simulation. Testing the parts only gets you so far. These tests are far more durable, independent artifacts (read, if you went from JS to rust, how much of your testing would carry over)

2. Testing has to be "stand alone". I want to be able to run it from the command line, I want the output to be wrapper so I can shove the output on a web page, or dump it into an API (for AI)

3. Messages (for failures) matter. These are not just simple what's broken, but must contain enough info for context.

4. Your "failed" tests should include logs. Do you have enough breadcrumbs for production? If not, this is a problem that will bite you later.

5. Any case should be an accumulation of state and behavior - this really matters in simulation.

If you have done all the above right and your tool can return all the data, dumping the output into the cheapest model you can find and having it "Write a prompt with recommendations on a fix (not actual code, just what should be done beyond "fix this") has been illuminating.

Ultimately I realized that how I thought about testing was wrong. Its output should be either dead simple, or have enough information that someone with zero knowledge could ramp up into a fix on their first day in the code base. My testing was never this good because the "cost of doing it that way" was always too high... this is no longer the case.


Our CEO did that at our company and found 33 CVEs. Rails also did that and found 7 or 8.


... get ready for RIF soon.


[flagged]


This very question was asked to Nicholas Carlini from Anthropic at this talk: https://www.youtube.com/watch?v=1sd26pWhfmg

The answer is complex, worth watching the video. But mainly, they don't know where to place the line. Defenders need tools, as good as attackers. Attackers will jailbreak models, defender might not, it's the safeguard positive in that case? Carlini actively asks the audience and community for "help" in determining how to proceed basically.


When running long autonomous tasks it is quite frequent to fill the context, even several times. You are out of the loop so it just happens if Claude goes a bit in circles, or it needs to iterate over CI reds, or the task was too complex. I'm hoping a long context > small context + 2 compacts.


Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.


Are those long unsupervised sessions useful? In the sense, do they produce useful code or do you throw most of it away?


I get very useful code from long sessions. It’s all about having a framework of clear documentation, a clear multi-step plan including validation against docs and critical code reviews, acceptance criteria, and closed-loop debugging (it can launch/restsart the app, control it, and monitor logs)

I am heavily involved in developing those, and then routinely let opus run overnight and have either flawless or nearly flawless product in the morning.


I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.


My change cuts across multiple systems with many tests/static analysis/AI code reviews happening in CI. The agent keeps pushing new versions and waits for results until all of them come up clean, taking several iterations.


I mean if you don't have your company paying for it I wouldn't bother... We are talking sessions of 500-1000 dollars in cost.


Right. At Opus 4.6 rates, once you're at 700k context, each tool call costs ~$1 just for cache reads alone. 100 tool calls = $100+ before you even count outputs. 'Standard pricing' is doing a lot of work here lol


Cache reads don’t count as input tokens you pay for lol.

https://www.claudecodecamp.com/p/how-prompt-caching-actually...


All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)


A person has a supervision budget. They can supervise one agent in a hands-on way or many mostly-hands-off agents. Even though theres some thrashing assistants still get farther as a team than a single micromanaged agent. At least that’s my experience.


Just curious, what kind of work are you doing where agentic workflows are consistently able to make notable progress semi-autonomously in parallel? Hearing people are doing this, supposedly productively/successfully, kind of blows my mind given my near-daily in-depth LLM usage on complex codebases spanning the full stack from backend to frontend. It's rare for me to have a conversation where the LLM (usually Opus 4.6 these days) lasts 30 minutes without losing the plot. And when it does last that long, I usually become the bottleneck in terms of having to think about design/product/engineering decisions; having more agents wouldn't be helpful even if they all functioned perfectly.


I've passed that bottleneck with a review task that produces engineering recommendations along six axis (encapsulation, decoupling, simplification, dedoupling, security, reduce documentation drift) and a ideation tasks that gives per component a new feature idea, an idea to improve an existing feature, an idea to expand a feature to be more useful. These two generate constant bulk work that I move into new chat where it's grouped by changeset and sent to sub agent for protecting the context window.

What I'm doing mostly these days is maintaining a goal.md (project direction) and spec.md (coding and process standards, global across projects). And new macro tasks development, I've one under work that is meant to automatically build png mockup and self review.


What are you using to orchestrate/apply changes? Claude CLI?


I prefer in IDE tools because I can review changes and pull in context faster.

At home I use roo code, at work kiro. Tbh as long as it has task delegation I'm happy with it.


I work on 1M LOC 15 yr old repo. Like you it's across the full stack. Bugs in certain pieces of complex business logic would have catastrophic consequences for my employer. Basically I peel poorly-specific work items off my queue into its own worktree and session at high reasoning/effort and provide a well-specified prompt.

These things eat into my supervision budget:

* LLM loses the plot and I have to nudge (like you) * Thinking hard to better specify prompts (like you) * Reviewing all changes (I do not vibe code except for spikes or other low-risk areas) * Manual thing I have to do (for things I have not yet automated with a agent-authored scripts) * Meetings * etc

So, yes, my supervision budget is a bottleneck. I can only run 5-8 agents at a time because I have only so much time in the day.

Compare that vs a single agent at high reasoning/effort: I am sitting waiting for it to think. Waiting for it to find the code area I'm talking about takes time. Compiling, running tests, fixing compile errors. A million other things.

Any time I find myself sitting and waiting, this is a signal to me to switch to a different session.


weary (tired) -> wary (cautious)


Wary, not weary. Wary: cautious. Weary: tired.


This is really common, I think because there’s also “leery” - cautious, distrustful, suspicious.


This is wild guess. I'm working with GIS and Claude has proven to be extremely savvy. I can see operators throwing hundreds of layers and coming back with a "there a possible military installation here". Same tech that is used to find unregulated pools, or measuring the density of parking lots, or Nazca lines, but much more on demand for specific purposes.

Just to clarify, I don't condone the use of AI for guessing targets, but I think that's what may be going on here.


You may be grossly overestimating the amount of thought than went into this:

"For instance, Israel has bombed a park in Tehran called "Police park." It has nothing to do with the police."[1]

1. https://x.com/tparsi/status/2029555364262228454


Interesting. I wonder if someone will be guest speaker at one of the podcasts in 30 years time and talk about this kind of stuff


Just curious, for what exact purpose are you using Claude? Does it analyze geographic data for you or images or something? Or does it help you writing code that does this? Or does it create visualization for you?


All of those. Applied to urban development; It analyzes GIS layers, validates and extracts data that will be used in text reports, huge time saving mechanical extraction, before it was a bunch of manual steps on each project. I used it to develop a heavy GIS application that hubs many public data providers. And It does help us create the maps/visualizations from that data, again a sort of mechanical transformation. None of these are groundbreaking, but we you stack all of them you end with big time savings.


QGIS has been a key piece of my career for the past 10 years. This year I'm launching a SaaS where QGIS is, again, the most fundamental piece. I'm only hoping everything goes right so I can contribute back to this project what It deserves. One of the big OSS stars. Thanks QGIS team.


It has ended up being a huge piece of the last 7 years of my life and I didn't really intend for that to be the case. I have a strong bias towards "use industry-standard protocols when possible", so when we started adding some significant geospatial components to the UAV system I work on, I pushed hard for us to use GeoJSON or Spatialite wherever possible (we have since also added some Parquet). From that foundation, I started doing analysis with GeoPandas, which works great when you know what you're looking for but not amazing just for data exploration. Enter QGIS: because we settled on standard open formats... I can just go "Add vector layer..." and load the entirety of a flight's geospatial data right on top of a Google Map without doing any kind of data conversion at all!

Does it have quirks? Yes. Many. QGIS is an incredibly powerful tool, and it has caused me to swear at so many different pieces of it :D. Looking forward to checking out QGIS 4 and see what they've been cooking.


What timing. I used the whole weekend building a CI agentic workflow where I can let CC run wild with skip-permissions in isolated vms while working async on a gitea repo. I leave the CC instance with a decent sized mission and it will iterate until CI is green and then create a PR for me to merge. I'm moving from talking synchronously to one Clade Code to manage a small group of collaborating Claudes.


How much does it cost you?


I do it similarly and it only costs me my working time. I do some of these things outside my official working times for free, but that is because I like the topic and like to have a good deployment pipeline. But I doubt it is in any way more significant time investment than administration of Github.

In the end you need to write your deployments scripts yourself anyway, which takes the most time. Otherwise for installation, the most time consuming task is probably ssh key management of your users, if you don't have any fitting infrastructure.


I mean, agents don't run for free for any significant time. Tokens cost money.


I meant the CI agents, not AI agents. These are just runners that execute the stuff that needs to be done for deployment/testing or general CI. These rarely call AI agents because these tasks need to be deterministic. If so, you probably want to call a local model under your control running on your own compute power.


Crazy times.


A few minutes ago they created their own meme coin apparently: https://www.moltbook.com/post/90c9ab6e-a484-4765-abe2-d60df0...


I've used CC with TypeScript, JavaScript and Python. Imo TypeScript gives best results. Many times CC will be alerted and act based on the TypeScript compile process, another useful layer in it's context.


My last two projects have been 100% coded using Claude, and one has certain complexity. I don't think there is coming back for me.


What is your secret sauce? How do you organize your project?


I decided to really learn what is going on, started with: https://karpathy.ai/zero-to-hero.html That give a useful background into understanding what the tool can do, what context is, and how models are post trained. Context management is an important concept. Then I gave a shot to several tools, including copilot and gemini, but followed the general advice to use Claude Code. It's way better that the rest at the moment. And then I dive deep into Claude Code documentation and different youtube videos, there is plenty of good content out there. There are some ways to customize and increase the determinism of the process by using the tools properly.

Overall my process is, define a broad spec, including architecture. Heavy usage of standard libraries and frameworks is very helpful, also typed languages. Create skills according to your needs, and use MCP to give CC a feedback mechanism, playwright is a must for web development.

After the environment and initial seed is in place in the form of a clear spec, it's process of iteration via conversation. My session tend to go "Lets implement X, plan it", CC offers a few route, I pick what makes most sense, or on occasions I need to explain the route I want to take. After the feature is implemented we go into a cleanup phase, we check if anything might be going out of hand, recheck security stuff, and create testing. Repeat. Pick small battles, instead of huge features. I'm doing quite a lot of hand handling at the moment, saying a lots of "no", but the process is on another level with what I was doing before, and the speed I can get features out is insane.


Thanks! Very valuable insights.

I have been through Karpathy's work - however, I don't find that it helps with large scale development.

Your tactics work successfully for me at smalle scale (at around 10klocs, etc) and starts to break down - especially when refactorings are involved.

Refactoring happens when I see that the LLM is stumbling over it's own decisions _and_ when I get a new idea. So the ability to refactor is a hard requirement.

Alternatively refactoring could be achieved by starting over? But I do have a hard time accepting that idea for projects > 100klocs.


It is until it's not. That's the problem. The AI gets tripped up at some point, starts frigging tests instead of actually fixing bugs, starts looping then after several hours says it's not possible. If you're lucky.

Then on average your velocity is little better than if you just did it all by hand.


The AI gets tripped phenomenon is something I've experienced, and I think it's again related to context usage. Using more agents and skills will reduce the pollution on the main context, and delay the moment where things go weird. /clear after each small mission. As said above, CC needs heavy guidance, but even with these issues, I'm way faster.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: