Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it depends - the actual thing to measure it to keep a developer in flow state. Many errors as well as latency break this. To be brief yes, accuracy comes first.

Quality is measured 2 main ways:

1) End-to-end: User query -> to task resolution. These are aider style benchmarks answering the question of actual task completion

2) Apply Quality: Syntax correctness, character diff, etc..

The error rate for large vs fast is around 2%. If you're doing code edits that are extremely complex or on obscure languages - large is the better option. There's also an auto option to route to the model we think is best for a task






I don't believe anyone can be in some kind of "flow state" while waiting on LLM responses. I think it's funny that we complained for years about C and others being slow to compile and now folks are fine waiting seconds++ everytime they want to change something.

This is gonna sound like some chad hype shit, but I've tried just working 2 different projects simultaneously and have had some incredible extended flow sessions. It felt like the old days of multitabling poker.

I had tried doing it with different features in different worktrees in the same codebase but found flow much harder there.

Lately I am also just spending a lot more time reworking code manually to keep the code in good shape. Still getting a ton of value out of the LLM doing a lot of work, but not exactly spending lots of time just waiting for it because I am dropping back down to manual mode frequently.


    > we complained for years about C and others being slow to compile
For C? I don't remember that, unless headers are poorly managed. C++? Definitely yes.

What do we not complain about if we're being honest?

how so? Is your view that flow state at all isnt a thing, or just with using LLMs?

Flow state is 100% a thing, it's just impossible with LLMs (at least, for me). I can't be blocked waiting on things during a flow state or my mind starts wondering to other places.

Have you tried any of the ludicrously fast LLM demos yet?

https://inference.cerebras.ai/ and https://groq.com/ and https://deepmind.google/models/gemini-diffusion/ (waitlisted) are all 10 to 100x faster than regular models, which really does have a meaningful impact on how I interact with them because I don't have to disengage for 15+ seconds while I wait for a response.

I have video demos of a few of those: https://simonwillison.net/2024/Oct/25/llm-cerebras/ and https://simonwillison.net/2024/Oct/31/cerebras-coder/ and https://simonwillison.net/2025/May/21/gemini-diffusion/


Fast Apply definitely helps with keeping flow state and is a large part of Cursor's success

Personally I work on multiple repos at a time to solve for this


I do it like simultaneous exhibition in chess:

- Multiple repos or independent changes in monorepo

- First round of changes idgaf about anything beyond public interface and unit tests

   - I review public interface and make changes if needed
  
   - I review unit tests it wrote to see that at least from the outside it looks alright.

 - here I either:
   
   - make more unit tests (features, edge cases and make it write code for it)

   - polish what it generate

sounds like flow state to me

oh it's fore sure is. But I use amazon q almost exclusively. One thing that gets me out of this state: when I have to do the math on "should I just do it myself" vs "keep refining prompt/context until this thing finally gets it right".

so frustrating how slow edits are in Q dev

Sometimes it splits edits to a single file into way to many fs_write(s) and often get stuck not being able to apply edits. It also so conservative with using your machine resources: kept trying to run test suit with a single worker, like come on, I paid for 32 cores, I will be using 32 cores.

Flow state has been redefined now that we are all using Claude Code. If I can stay focused on tests, reviewing code, etc while CC is doing its thing, we are good. The kloc/s doesn't matter as much.

if LLMs are ever able to write the kind of code I write for work, I'm going to move to management. spending 100% of my time reviewing AI slop and writing tests is the opposite of what I want. I want to define behavior quickly and have AI do the boring parts; you're letting the computer do the fun bit and spending your entire life doing the shit part, and paying for the privilege.

fuck. THAT.


We have it backwards. Claude should be reviewing work and writing tests.

No one sane would trust an LLM with that task, which is how we know it's not ready for production use yet.

I might put this on a plaque.

I realize this sounds harsh, but I assume anyone who is pushing for developers to basically take on all the shit work of a tech lead stuck managing a bunch of incompetent developers is not an actual developer, and is either an incompetent one who hopes LLMs will cover for them or someone looking to reduce their dependency on developers.

Fortunately for me, I think we'll be well into the Matrix before my job can be done adequately by AI so I have the luxury of using it as a tool here and there where it makes sense rather than spending most of my time trying to avoid the damage a firehose of hallucinations will do to my codebase.


Time really is a flat circle. My software career started with me archaically flipping characters in a file I vaguely understood with long pauses waiting on magic compilers to give me my actual output.

Now it's dying in the same place. Thankfully I got to spend the brunt of my career working through the fun, intermediate years.


I've never had so much fun coding in my life - you should definitely give it a try again!

Thanks, I appreciate the good vibes.

However, it's kind of a trope for me at this point that people assume a negative opinion of using generative AI in the development process is due to a lack of experience using it.


Well you could articulate what issues you have with it. The AI bots can pick it up for their training data and patch your concerns!

> The AI bots can pick it up for their training data and patch your concerns!

This is borderline mystical AI speak to me. I know what you mean, and no, it doesn't work like that. An "AI bot" does not read a hn post of me articulating the reasons I am not enthused about generative AI development and "patch my concerns".


Next time, I'll wrap it up in <sarcasm></sarcasm>.

Truly ironic the AI readily detected what I said as sarcastic. Without context. https://claude.ai/share/7d14287d-c066-4927-8942-8eb8dd8d7e7f


Ah, thanks for the explination. I actually was confused a bit. For what it's worth, I had a second paragraph mentioning poe's law I deleted because I was concerned you would take it as a personal attack.

I should have left it in, knowing you were sarcastic I think you'd have appreciated me being confused about whether you were being satirical or not.


That would have been a perfect opportunity for me to finally internalize Poe's law.

Haha. You got me. I couldn't tell. I really couldn't tell.

the claude link is hilarious, hahaha

I've had the opposite experience.

same

> the actual thing to measure it to keep a developer in flow state.

Personally, I find flow state hard to achieve when I constantly have to switch modes to debugging LLM output or an edit error that I missed.

When the majority of time is spent waiting for the main LLM to think, I will always wait a few extra seconds for a better edit than risk having to spend multiple cycles playing find-the-bug because something didn't get applied correctly somewhere.


Like most things its a tradeoff. Developer tolerance for errors is extremely low - but the error rate for Fast Apply is even lower

Glad to hear quality comes first! Then I assume you have some public benchmarks like the ones you mention that are reproducible? I could only find this graph https://docs.morphllm.com/guides/apply but there is no mention of what it refers to, what data it used etc.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: