It's not *that* hard to review how much you actually got done and check whether ...

freehorse · 2025-10-02T17:34:21 1759426461

To do that properly, one needs some kind of control, which is hard to do with one person. It should be doable with proper effort, but far from trivial, because it is not enough to measure what you actually did in one condition, you have to compare it with sth. And then there can be a lot of noise for n=1: when you use LLMs, maybe you happen to have to solve harder tasks. So you need at least to do it over quite a lot of time, or make sure the difficulty of tasks is similar. If you have a group of people, you can put them into groups instead and thus not care as much for these parameters, because you can assume that when you average this "noise" will cancel out.

fragmede · 2025-10-02T13:23:04 1759411384

The problem isn't a delta between what got done and how much it felt like got done. The problem is it's not known how it would have taken you to do what got done unless you do it twice. Once by hand and once with an LLM, and then compare. Unfortunately, regardless of what you find, HN will be rushing to say N=1, so there's little incentive to report on any individual results.

emp17344 · 2025-10-02T14:50:42 1759416642

In fact, when this was studied, it was found that using AI actually makes developers less productive:

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...