More

spaceman_2020 · 2026-01-08T05:54:12 1767851652

And here in largely vegetarian India, everyone is now pushing for more protein and meats because a vegetable-heavy diet has been awful for our public health

Ar-Curunir · 2026-01-08T10:27:33 1767868053

Even if Indians ate 2x the meat that they do now, they wouldn’t consume anywhere as much as Americans do. Increasing meat consumption in America is not necessary.

India would do well to consume more protein, and the US would do well to consume less

throwaway2037 · 2026-01-08T10:18:23 1767867503

    > because a vegetable-heavy diet has been awful for our public health

I think the biggest health issue with India's vegetarian food is too many carbs.

spaceman_2020 · 2026-01-08T19:52:59 1767901979

If you want to actually get full and satiated with a largely vegetarian diet, you will eventually resort to carbs

And this is for a culture that really knows how to make smashingly good vegetarian dishes

I love my vegetables, but a vegetable-heavy diet is clearly not something that everyone can or should do. The people I know who retain their health with vegetarian/vegan diets are usually really well-versed in nutrition

cogman10 · 2026-01-08T16:55:23 1767891323

Carbs and butter.

If you look at a lot of the indian vegetarian dishes you'll find things like potatoes fried in butter being a staple.

Chickpeas and yogurt do make a showing, but a lot of indian dishes are devoid of vegetarian protein sources. You need a lot more beans/nuts if you want to eat healthy as a vegetarian.

throwaway2037 · 2026-01-10T12:16:51 1768047411

    > a lot of indian dishes are devoid of vegetarian protein sources

What about legumes -- daal (pulses) and chickpeas? They have plenty of protein for vegetable sources. Also: Paneer. What I find in practice: You get a tiny amount of legumes/paneer, and a huge amount of carbs.

badgersnake · 2026-01-08T09:57:27 1767866247

Well if you cook the vegetables in litres of ghee that’ll happen.

spaceman_2020 · 2026-01-08T19:54:31 1767902071

If you don't cook vegetables in plenty of fats, you will end up feeling unsatiated - at least the vast majority would if that's their only meal

Salads are great, but eat them 7x a week for 2x meals a day and most will end up binge eating some absolute trash just to feel full

genewitch · 2026-01-08T08:21:50 1767860510

has your government published any science on this? being completely serious, i'd like to read it. Is India mostly vegetarian because of lack of access to farms/meats, religious reasons, financial, or what? I didn't know it was largely vegetarian. I don't know i had an idea of the ratio or that it would be different than any other country.

Apparently the Mediterranean also is largely vegetarian. at least the eponymous diet is.

simiones · 2026-01-08T11:17:43 1767871063

Most branches of hinduism condemn meat eating, so this has created a significant pressure against meat production (same as you'll find little production of pork in the Middle East and North Africa). This is not universal, of course, because historically many regions of India had large meat-eating muslim populations as well.

Note that this is typically lacto-ovo-vegetarianism, not veganism.

galangalalgol · 2026-01-08T12:29:56 1767875396

Maybe home cooking is, but every restaurant meal I bhave eaten near the Mediterranean had seafood or cheese in it.

Edit: you said vegetarian not vegan, and yeah lot of pasta dishes are vegetarian but not vegan.

spaceman_2020 · 2026-01-07T18:38:35 1767811115

The only thing that can save open source software is open source LLMs

Unfortunately only the Chinese are really being serious about that

spaceman_2020 · 2026-01-07T18:37:49 1767811069

Well, I never read the artcicle because paywall, but there is a WSJ headline today about a $160k mechanic job at Ford that can't be fulfilled because no labor

hackable_sand · 2026-01-07T22:58:42 1767826722

Right, that's a lie.

spaceman_2020 · 2026-01-06T18:24:33 1767723873

It's very good at following instructions. You can build dedicated agents for different tasks (backend, API design, database design) and make it follow design and coding patterns.

It's verbose by default but a few hours of custom instructions and you can make it code just like anyone

svieira · 2026-01-06T21:44:55 1767735895

> just like anyone

Arthur Whitney?

https://en.wikipedia.org/wiki/Arthur_Whitney_(computer_scien...

jama211 · 2026-01-09T18:42:02 1767984122

Sure why not

spaceman_2020 · 2026-01-06T18:22:46 1767723766

What's even the point of this comment if you self-admittedly don't have access to the flagship tool that everyone has been using to make these big bold coding claims?

hu3 · 2026-01-06T18:36:37 1767724597

isn't Claude Teams powerful? does it not have access to Opus?

pardon my ignorance.

I use GitHub Copilot which has access to llms like Gemini 3, Sonnet/Opus 4.5 ang GPT 5.2

halfmatthalfcat · 2026-01-06T18:24:38 1767723878

Because the same claims of "AI tool does everything" are made over and over again.

spaceman_2020 · 2026-01-06T18:32:32 1767724352

The claims are being made for Claude Code, which you don't have access to.

mr_mitm · 2026-01-06T20:53:45 1767732825

I believe part of why Claude Code is so great because it has the chance to catch its own mistakes. It can run compilers, linters, browsers and check its own output. If it makes a mistake, it takes one or two extra iterations until it gets it right.

fragmede · 2026-01-06T18:35:57 1767724557

It's not "AI tool does everything", it's specifically Claude Code with Opus 4.5 is great at "it", for whatever "it" a given commenter is claiming.

spaceman_2020 · 2026-01-06T18:21:05 1767723665

I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.

And I get it. Coding with Claude Code really was prompting something, getting errors, and asking it to fix it. Which was still useful but I could see why a skilled coder adding a feature to a complex codebase would just give up

Opus 4.5 really is at a new tier however. It just...works. The errors are far fewer and often very minor - "careless" errors, not fundamental issues (like forgetting to add "use client" to a nextjs client component.

ryandrake · 2026-01-06T22:13:00 1767737580

This was me. I was a huge AI coding detractor on here for a while (you can check my comment history). But, in order to stay informed and not just be that grouchy curmudgeon all the time, I kept up with the models and regularly tried them out. Opus 4.5 is so much better than anything I've tried before, I'm ready to change my mind about AI assistance.

I even gave -True Vibe Coding- a whirl. Yesterday, from a blank directory and text file list of requirements, I had Opus 4.5 build an Android TV video player that could read a directory over NFS, show a grid view of movie poster thumbnails, and play the selected video file on the TV. The result wasn't exactly full-featured Kodi, but it works in the emulator and actual device, it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything. It was pretty astounding.

Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.

mikestorrent · 2026-01-06T22:37:55 1767739075

I have a few Go projects now and I speak Go as well as you speak Kotlin. I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.

For instance, I always respected types, but I'm too lazy to go spend hours working on types when I can just do ruby-style duck typing and get a long ways before the inevitable problems rear their head. Now, I can use a strongly typed language and get the advantages for "free".

gck1 · 2026-01-07T15:30:00 1767799800

> I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.

Oh absolutely. I've been using Python for past 15 or so years for everything.

I've never written a single line of Rust in my life, and all my new projects are Rust now, even the quick-script-throwaway things, because it's so much better at instantly screaming at claude when it goes off track. It may take it longer to finish what I asked it to do, but requires so much less involvement from me.

I will likely never start another new project in python ever.

EDIT: Forgot to add that paired with a good linter, this is even more impressive. I told Claude to come up with the most masochistic clippy configuration possible, where even a tiny mistake is instantly punished and exceptions have to be truly exceptional (I have another agent that verifies this each run).

I just wish there was cargo-clippy for enforcing architectural patterns.

tezza · 2026-01-07T00:04:07 1767744247

and with types, it makes it easier for rounds of agents to pick up mistakes at compile time, statically. linting and sanity checking untyped languages only goes so far. I've not seen LLM's one shot perl style regexes. and javascript can still have ugly runtime WTFs

nl · 2026-01-07T02:44:58 1767753898

I've found this too.

I find I'm doing more Typescript projects than Python because of the superior typing, despite the fact I prefer Python.

myk9001 · 2026-01-07T00:47:21 1767746841

Oh, wow, that's impressive, thanks for sharing!

Going to one-up you though -- here's a literal one-liner that gets me a polished media center with beautiful interface and powerful skinning engine. It supports Android, BSD, Linux, macOS, iOS, tvOS and Windows.

`git clone https://github.com/xbmc/xbmc.git`

ryandrake · 2026-01-07T01:01:04 1767747664

Hah! I actually initiated the project because I'm a long time XBMC/Kodi user. I started using it when it was called XBMC, on an actual Xbox 1. I am sick and tired of its crashing, poor playback performance, and increasingly bloated feature set. It's embarrassing when I have friends or family over for movie night, and I have to explain "Sorry folks, Kodi froze midway through the movie again" while I frantically try to re-launch/reboot my way back to watching the movie. VLC's playback engine is much better but the VLC app's TV UX is ass. This application actually uses the libVLC playback engine under the hood.

apitman · 2026-01-07T04:00:31 1767758431

I think anecdotes like this may prove very relevant the next few years. AI might make bad code, but a project of bad code that's still way smaller than a bloated alternative, and has a UX tailored to your exact requirements could be compelling.

A big part of the problem with existing software is that humans seem to be pretty much incapable of deciding a project is done and stop adding to it. We treat creating code like a job or hobby instead of a tool. Nothing wrong with that, unless you're advertising it as a tool.

ryandrake · 2026-01-07T04:54:02 1767761642

Yea, after this little experiment, I feel like I can just go through every big, bloated, slow, tech-debt-ridden software I use and replace it with a tiny, bespoke version that does only what I need and no more.

The old adage about how "users use 10% of your software's features, but they each use a different 10%" can now be solved by each user just building that 10% for themselves.

indigodaddy · 2026-01-07T03:27:58 1767756478

Have you tried VidHub? Works nicely against almost anything. Plex, jellyfin, smb/webdav folder etc

ku1ik · 2026-01-07T08:11:49 1767773509

How do you know “it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything” if you built it just yesterday? Isn’t it a bit too early for claims like this? I get it’s easy to bring ideas to life but aren’t we overly optimistic?

ryandrake · 2026-01-07T16:57:00 1767805020

Part of the "one day" development time was exhaustively testing it. Since the tool's scope is so small, getting good test coverage was pretty easy. Of course, I'm not guaranteeing through formal verification methods that the code is bug free. I did find bugs, but they were all areas that were poorly specified by me in the requirements.

missingdays · 2026-01-07T09:46:08 1767779168

By tomorrow the app will be replaced with a new version from the other competitor, by that time the memory leak will not reveal itself

rdedev · 2026-01-07T01:01:57 1767747717

I decided to vibe code something myself last week at work. I've been wanting to create a poc that involves a coding agent create custom bokeh plots that a user can interact with and ask follow up questions. All this had to be served using a holoview panel library

At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems. Claude was able to get slthe first iteration up pretty quick. At that stage the app could create a plot and you could interact with it and ask follow up questions.

Then I asked it to extend the app so that it could generate multiple plots and the user could interact with all of them one at a time. It made a bunch of changes but the feature was never implemented. I asked it to do again but got the same outcome. I completely accept the fact that it could just be all because I am using vscode copilot or my promoting skills are not good but the LLM got 70% of the way there and then completely failed

cebert · 2026-01-07T02:44:06 1767753846

> At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems.

You really need to at least try Claude Code directly instead of using CoPilot. My work gives us access to CoPilot, Claude Code, and Codex. CoPilot isn’t close to the other more agentic products.

debian3 · 2026-01-07T03:50:46 1767757846

Vs code copilot extension the harness is not great, but Opus 4.5 with Copilot CLI works quite well.

pluralmonad · 2026-01-07T17:39:35 1767807575

Do they manage context differently or have different system prompts? I would assume a lot of that would be the same between them. I think GH Copilots biggest shortcoming is that they are too token cheap. Aggressively managing context to the detriment of the results. Watching Claude read a 500 line file in 100 line chunks just makes me sad.

yieldcrv · 2026-01-06T22:31:37 1767738697

I recently replaced my monitor with one that could be vertically oriented, because I'm just using Claude Code in the terminal and not looking at file trees at all

but I do want a better way to glance and keep up with what its doing in longer conversations, for my own mental context window

adastra22 · 2026-01-06T23:43:55 1767743035

Ah, but you’re at the beginning stage young grasshopper. Soon you will be missing that horizontal ultra wide monitor as you spin up 8 different Claude agents in parallel seasons.

yieldcrv · 2026-01-07T00:07:35 1767744455

oh I noticed! I've begun doing that on my laptop. I just started going down all my list of sideprojects one by one, then two by two, a Claude Code instance in a terminal window for each folder. It's a bit mental

I'm finding that branding and graphic design is the most arduous part, that I'm hoping to accelerate soon. I'm heavily AI assisted there too and I'm evaluating MCP servers to help, but so far I do actually have to focus on just that part as opposed to babysit

libraryofbabel · 2026-01-07T00:20:08 1767745208

Thanks for posting this. It's a nice reminder that despite all the noise from hype-mongers and skeptics in the past few years, most of us here are just trying to figure this all out with an open mind and are ready to change our opinions when the facts change. And a lot of people in the industry that I respect on HN or elsewhere have changed their minds about this stuff in the last year, having previously been quite justifiably skeptical. We're not in 2023 anymore.

If you were someone saying at the start of 2025 "this is a flash in the pan and a bunch of hype, it's not going to fundamentally change how we write code", that was still a reasonable belief to hold back then. At the start of 2026 that position is basically untenable: it's just burying your head in the sand and wishing for AI to go away. If you're someone who still holds it you really really need to download Claude Code and set it to Opus and start trying it with an open mind: I don't know what else to tell you. So now the question has shifted from whether this is going to transform our profession (it is), to how exactly it's going to play out. I personally don't think we will be replacing human engineers anytime soon ("coders", maybe!), but I'm prepared to change my mind on that too if the facts change. We'll see.

I was a fellow mind-changer, although it was back around the first half of last year when Claude Code was good enough to do things for me in a mature codebase under supervision. It clearly still had a long way to go but it was at that tipping point from "not really useful" to "useful". But Opus 4.5 is something different - I don't feel I have to keep pulling it back on track in quite the way I used to with Sonnet 3.7, 4, even Sonnet 4.5.

For the record, I still think we're in a bubble. AI companies are overvalued. But that's a separate question from whether this is going to change the software development profession.

arcfour · 2026-01-07T01:39:03 1767749943

The AI bubble is kind of like the dot-com bubble in that it's a revolutionary technology that will certainly be a huge part of the future, but it's still overhyped (i.e. people are investing without regard for logic).

ryandrake · 2026-01-07T02:08:17 1767751697

We were enjoying cheap second hand rack mount servers, RAM, hard drives, printers, office chairs and so on for a decade after the original dot com crash. Every company that went out of business liquidated their good shit for pennies.

I'm hoping after AI comes back down to earth there will be a new glut of cheap second hand GPUs and RAM to get snapped up.

libraryofbabel · 2026-01-07T02:04:14 1767751454

Right. And same for railways, which had a huge bubble early on. Over-hyped on the short time horizon. Long term, they were transformative in the end, although most of the early companies and early investors didn’t reap the eventual profits.

nl · 2026-01-07T02:54:26 1767754466

But the dot-com bubble wasn't overhyed in retrospect. It was under-hyped.

arcfour · 2026-01-07T03:19:26 1767755966

At the time it was overhyped because just by adding .com to your company's name you could increase your valuation regardless of whether or not you had anything to do with the internet. Is that not stupid?

I think my comparison is apt; being a bubble and a truly society-altering technology are not mutually exclusive, and by virtue of it being a bubble, it is overhyped.

retsibsi · 2026-01-07T04:33:21 1767760401

There was definitely a lot of stupid stuff happening. IMO the clearest accurate way to put it is that it was overhyped for the short term (hence the crazy high valuations for obvious bullshit), and underhyped for the long term (in the sense that we didn't really foresee how broadly and deeply it would change the world). Of course, there's more nuance to it, because some people had wild long-term predictions too. But I think the overall, mainstream vibe was to underappreciate how big a deal it was.

fpauser · 2026-01-06T22:53:34 1767740014

> Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.

... says it all.

jononor · 2026-01-08T08:54:55 1767862495

What exactly does it say, in your opinion? I can imagine 4-5 different takes on that post.

theshrike79 · 2026-01-07T01:00:49 1767747649

> "asking it to fix it."

This is what people are still doing wrong. Tools in a loop people, tools in a loop.

The agent has to have the tools to detect whatever it just created is producing errors during linting/testing/running. When it can do that, I can loop again, fix the error and again - use the tools to see whether it worked.

I _still_ encounter people who think "AI programming" is pasting stuff into ChatGPT on the browser and they complain it hallucinates functions and produces invalid code.

Well, d'oh.

ikornaselur · 2026-01-07T13:16:11 1767791771

Last weekend I was debugging some blocking issue on a microcontroller with embassy-rs, where the whole microcontroller would lock up as soon as I started trying to connect to an MQTT server.

I was having Opus investigate it and I kept building and deploying the firmware for testing.. then I just figured I'd explain how it could do the same and pull the logs.

Off it went, for the next ~15 minutes it would flash the firmware multiple times until it figured out the issue and fixed it.

There was something so interesting about seeing a microcontroller on the desk being flashed by Claude Code, with LEDs blinking indicating failure states. There's something about it not being just code on your laptop that felt so interesting to me.

But I agree, absolutely, red/green test or have a way of validating (linting, testing, whatever it is) and explain the end-to-end loop, then the agent is able to work much faster without being blocked by you multiple times along the way.

gck1 · 2026-01-07T15:38:12 1767800292

This is kind of why I'm not really scared of losing my job.

While Claude is amazing at writing code, it still requires human operators. And even experienced human operators are bad at operating this machinery.

Tell your average joe - the one who thinks they can create software without engineers - what "tools-in-a-loop" means, and they'll make the same face they made when you tried explaining iterators to them, before LLMs.

Explain to them how typing system, E2E or integration test helps the agent, and suddenly, they now have to learn all the things they would be required to learn to be able to write on their own.

nprateem · 2026-01-07T13:34:42 1767792882

Jules is slow incompetent shit and that uses tools in a loop, so no...

ern · 2026-01-07T00:11:19 1767744679

I have been out of the loop for a couple of months (vacation). I tried Claude Opus 4.5 at the end of November 2025 with the corporate Github Copilot subscription in Agent mode and it was awful: basically ignoring code and hallucinating.

My team is using it with Claude Code and say it works brilliantly, so I'll be giving it another go.

How much of the value comes from Opus 4.5, how much comes from Claude Code, and how much comes from the combination?

everfrustrated · 2026-01-07T01:16:50 1767748610

As someone coming from GitHub copilot in vscode and recently trying Claude Code plugin for vscode I don't get the fuss about Claude.

Copilot has by far the best and most intuitive agent UI. Just make sure you're in agent mode and choose Sonnet or Opus models.

I've just cancelled my Claude sub and gone back and will upgrade to the GH Pro+ to get more sonnet/opus.

indigodaddy · 2026-01-07T03:34:07 1767756847

Check out Antigravity+Google AI Pro $20 plan+Opus 4.5. apparently the Opus limits are insanely generous (of course that could change on a dime).

pluralmonad · 2026-01-07T18:09:37 1767809377

I strongly concur with your second statement. Anything other than agent mode in GH copilot feels useless to me. If I want to engage Opus through GH copilot for planning work, I still use agent mode and just indicate the desired output is whatever.md. I obviously only do this in environments lacking a better tool (Claude Code).

ern · 2026-01-07T01:51:35 1767750695

I'd used both CC and Copilot Agent Mode in VSCode, but not the combination of CC + Opus 4.5, and I agree, I was happy enough with Copilot.

The gap didn't seem big, but in November (which admittedly was when Opus 4.5 was in preview on Copilot) Opus 4.5 with Copilot was awful.

Dusseldorf · 2026-01-07T01:05:33 1767747933

I suspect that's the other thing at play here; many people have only tried Copilot because it's cheap with all the other Microsoft subscriptions many companies have. Copilot frankly is garbage compared to Cursor/Claude, even with the same exact models.

AstroBen · 2026-01-06T18:42:10 1767724930

my issue hasn't been for a long time now that the code they write works or doesn't work. My issues all stem from that it works, but does the wrong thing

zmmmmm · 2026-01-06T21:44:41 1767735881

> My issues all stem from that it works, but does the wrong thing

It's an opportunity, not a problem. Because it means there's a gap in your specifications and then your tests.

I use Aider not Claude but I run it with Anthropic models. And what I found is that comprehensively writing up the documentation for a feature spec style before starting eliminates a huge amount of what you're referring to. It serves a triple purpose (a) you get the documentation, (b) you guide the AI and (c) it's surprising how often this helps to refine the feature itself. Sometimes I invoke the AI to help me write the spec as well, asking it to prompt for areas where clarification is needed etc.

giancarlostoro · 2026-01-06T21:57:32 1767736652

This is how Beads works, especially with Claude Code. What I do is I tell Claude to always create a Bead when I tell it to add something, or about something that needs to be added, then I start brainstorming, and even ask it to do market research what are top apps doing for x, y or z. Then ask it to update the bead (I call them tasks) and then finally when its got enough detail, I tell it, do all of these in parallel.

beoberha · 2026-01-06T22:39:37 1767739177

Beads is amazing. It’s such a simple concept but elevates agentic coding to another levels

simonw · 2026-01-06T21:32:04 1767735124

If it does the wrong thing you tell it what the right thing is and have it try again.

With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try.

GoatInGrey · 2026-01-06T22:17:30 1767737850

There are several rubs with that operating protocol extending beyond the "you're holding it wrong" claim.

1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

simonw · 2026-01-06T22:26:51 1767738411

The value I'm getting from this stuff is so large that I'll take those risks, personally.

th0ma5 · 2026-01-07T01:45:48 1767750348

Glad you found a way to be unfalsifiable! Lol

dang · 2026-01-10T01:44:46 1768009486

Since we asked you to stop hounding another user in this manner and you've continued to do it repeatedly, I've banned the account. This is not what Hacker News is for, and you've done it almost 50 times (!), almost 30 of which have been after we first asked you to stop. That is extreme, and totally unacceptable.

https://news.ycombinator.com/item?id=46456850

https://news.ycombinator.com/item?id=44726957

https://news.ycombinator.com/item?id=44110805

(You've also been breaking the site guidelines in plenty of other places - e.g. https://news.ycombinator.com/item?id=46521516, https://news.ycombinator.com/item?id=46395646. This is not what this site is for, and destroys what it is for.)

scubbo · 2026-01-07T05:39:18 1767764358

Many people - simonw is the most visible of them, but there are countless others - have given up trying to convinced folks who are determined to not be convinced, and are simply enjoying their increased productivity. This is not a competition or an argument.

llmslave2 · 2026-01-07T06:48:40 1767768520

Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people?

My experience scrolling X and HN is a bunch of people going "omg opus omg Claude Code I'm 10x more productive" and that's it. Just hand wavy anecdotes based on their own perceived productivity. I'm open to being convinced but just saying stuff is not convincing. It's the opposite, it feels like people have been put under a spell.

I'm following The Primeagen, he's doing a series where he is trying these tools on stream and following peoples advice on how to use them the best. He's actually quite a good programmer so I'm eager to see how it goes. So far he isn't impressed and thus neither am I. If he cracks it and unlocks significant productivity then I will be convinced.

enraged_camel · 2026-01-07T07:39:22 1767771562

>> Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people?

Simon has produced plenty of evidence over the past year. You can check their submission history and their blog: https://simonwillison.net/

The problem with people asking for evidence is that there's no level of evidence that will convince them. They will say things like "that's great but this is not a novel problem so obviously the AI did well" or "the AI worked only because this is a greenfield project, it fails miserably in large codebases".

llmslave2 · 2026-01-07T08:10:00 1767773400

It's true that some people will just continually move the goalposts because they are invested in their beliefs. But that doesn't mean that the skepticism around certain claims aren't relevant.

Nobody serious is disputing that LLM's can generate working code. They dispute claims like "Agentic workflows will replace software developers in the short to medium term", or "Agentic workflows lead to 2-100x improvements in productivity across the board". This is what people are looking for in terms of evidence and there just isn't any.

Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity [0]. We also have evidence that it harms our cognitive abilities [1]. Anecdotally, I have found myself lazily reaching for LLM assistance when encountering a difficult problem instead of thinking deeply about the problem. Anecdotally I also struggle to be more productive using AI-centric agents workflows in areas of expertise.

We want evidence that "vibe engineering" is actually more productive across the entire lifespan of a software project. We want evidence that it produces better outcomes. Nobody has yet shown that. It's just people claiming that because they vibe coded some trivial project, all of software development can benefit from this approach. Recently a principle engineer at Google claimed that Claude Code wrote their team's entire year's worth of work in a single afternoon. They later walked that claim back, but most do not.

I'm more than happy to be convinced but it's becoming extremely tiring to hear the same claims being parroted without evidence and then you get called a luddite when you question it. It's also tiring when you push them on it and they blame it on the model you use, and then the agent, and then the way you handle context, and then the prompts, and then "skill issue". Meanwhile all they have to show is some slop that could be hand coded in a couple hours by someone familiar with the domain. I use AI, I was pretty bullish on it for the last two years, and the combination of it simply not living up to expectations + the constant barrage of what feels like a stealth marketing campaign parroting the same thing over and over (the new model is way better, unlike the other times we said that) + the amount of absolute slop code that seems to continue to increase + companies like Microsoft producing worse and worse software as they shoehorn AI into every single product (Office was renamed to Copilot 365). I've become very sensitive to it, much in the same way I was very sensitive to the claims being made by certain VC backed webdev companies regarding their product + framework in the last few years.

I'm not even going to bring up the economic, social, and environmental issues because I don't think they're relevant, but they do contribute to my annoyance with this stuff.

[0] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... [1] https://news.harvard.edu/gazette/story/2025/11/is-ai-dulling...

lunar_mycroft · 2026-01-07T08:24:14 1767774254

> Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity

I generally agree with you, but I'd be remiss if I didn't point out that it's plausible that the slow down observed in the METR study was at least partially due to the subjects lack of experience with LLMs. Someone with more experience performed the same experiment on themselves, and couldn't find a significant difference between using LLMs and not [0]. I think the more important point here is that programmers subjective assessment of how much LLMs help them is not reliable, and biased towards the LLMs.

[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...

llmslave2 · 2026-01-07T08:45:26 1767775526

I think we're on the same page re. that study. Actually your link made me think about the ongoing debate around IDE's vs stuff like Vim. Some people swear by IDE's and insist they drastically improve their productivity, others dismiss them or even claim they make them less productive. Sound familiar? I think it's possible these AI tools are simply another way to type code, and the differences averaged out end up being a wash.

AstroBen · 2026-01-07T15:36:42 1767800202

IDEs vs vim makes a lot of sense. AI really does feel like using an IDE in a certain way

Using AI for me absolutely makes it feel like I'm more productive. When I look back on my work at the end of the day and look at what I got done, it would be ludicrous to say it was multiple times the amount as my output pre-AI

Despite all the people replying to me saying "you're holding it wrong" I know the fix to it doing the wrong thing. Specify in more detail what I want. The problem with that is twofold:

1. How much to specify? As little as possible is the ideal, if we want to maximize how much it can help us. A balance here is key. If I need to detail every minute thing I may as well write the code myself

2. If I get this step wrong, I still have to review everything, rethink it, go back and re-prompt, costing time

When I'm working on production code, I have to understand it all to confidently commit. It costs time for me to go over everything, sometimes multiple iterations. Sometimes the AI uses things I don't know about and I need to dig into it to understand it

AI is currently writing 90% of my code. Quality is fine. It's fun! It's magical when it nails something one-shot. I'm just not confident it's faster overall

llmslave2 · 2026-01-07T20:56:27 1767819387

I think this is an extremely honest perspective. It's actually kind of cool that it's gotten to the point it can write most code - albeit with a lot of handholding.

theshrike79 · 2026-01-07T01:05:34 1767747934

I've said this multiple times:

This is why you use this AI bubble (it IS a bubble) to use the VC-funded AI models for dirt cheap prices and CREATE tools for yourself.

Need a very specific linter? AI can do it. Need a complex Roslyn analyser? AI. Any kind of scripting or automation that you run on your own machine. AI.

None of that will go away or suddenly stop working when the bubble bursts.

Within just the last 6 months I've built so many little utilities to speed up my work (and personal life) it's completely bonkers. Most went from "hmm, might be cool to..." to a good-enough script/program in an evening while doing chores.

Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

MarsIronPI · 2026-01-07T16:51:54 1767804714

> Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

Are there any local models that are at least somewhat comparable to the latest-and-greatest (e.g. Opus 4.5, Gemini 3), especially in terms of coding?

lunar_mycroft · 2026-01-07T04:17:14 1767759434

A risk I see with this approach is that when the bubble pops, you'll be left dependent on a bunch of tools which you don't know how to maintain or replace on your own, and won't have/be able to afford access to LLMs to do it for you.

theshrike79 · 2026-01-07T09:38:53 1767778733

The "tools" in this context are literally a few hundred lines of Python or Github CI build pipeline, we're not talking about 500kLOC massive applications.

I'm building tools, not complete factories :) The AI builds me a better hammer specifically for the nails I'm nailing 90% of the time. Even if the AI goes away, I still know how the custom hammer works.

AstroBen · 2026-01-07T05:37:12 1767764232

I thought that initially, but I don't think the skills AI weakens in me are particularly valuable

Let's say AI becomes too expensive - I more or less only have to sharpen up being able to write the language. My active recall of the syntax, common methods and libraries. That's not hard or much of a setback

Maybe this would be a problem if you're purely vibe coding, but I haven't seen that work long term

baq · 2026-01-07T06:59:48 1767769188

Open source models hosted by independent providers (or even yourself, which if the bubble pops will be affordable if you manage to pick up hardware on fire sales) are already good enough to explain most code.

kaydub · 2026-01-07T17:59:46 1767808786

> 1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

I can run multiple agents at once, across multiple code bases (or the same codebase but multiple different branches), doing the same or different things. You absolutely can't keep up with that. Maybe the one singular task you were working on, sure, but the fact that I can work on multiple different things without the same cognitive load will blow you out of the water.

> 2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

Tell the LLM to document in comments why it did things. Human developers often leave and then people with no knowledge of their codebase or their "whys" are even around to give details. Devs are notoriously terrible about documentation.

> 3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

You can't develop at the same velocity, so drop that assumption now. There's all kinds of lower abstractions that you build on top of that you probably can't explain currently.

> 4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

You aren't keeping up with the actual economics. This shit is technically profitable, the unprofitable part is the ongoing battle between LLM providers to have the best model. They know software in the past has often been winner takes all so they're all trying to win.

Capricorn2481 · 2026-01-07T00:31:48 1767745908

> With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try

That's great that this is your experience, but it's not a lot of people's. There are projects where it's just not going to know what to do.

I'm working in a web framework that is a Frankenstein-ing of Laravel and October CMS. It's so easy for the agent to get confused because, even when I tell it this is a different framework, it sees things that look like Laravel or October CMS and suggests solutions that are only for those frameworks. So there's constant made up methods and getting stuck in loops.

The documentation is terrible, you just have to read the code. Which, despite what people say, Cursor is terrible at, because embeddings are not a real way to read a codebase.

simonw · 2026-01-07T07:18:19 1767770299

I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast.

One trick I use that might work for you as well:

  Clone GitHub.com/simonw/datasette to /tmp
  then look at /tmp/docs/datasette for
  documentation and search the code
  if you need to

Try that with your own custom framework and it might unblock things.

If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

Capricorn2481 · 2026-01-07T21:49:43 1767822583

> I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast

Potentially because there is no baggage with similar frameworks. I'm sure it would have an easier time with this if it was not spun off from other frameworks.

> If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

If Claude cannot read the code well enough to begin with, and needs supplemental documentation, I certainly don't want it generating the docs from the code. That's just compounding hallucinations on top of each other.

simonw · 2026-01-08T07:48:16 1767858496

Give it a try and see get happens.

I find Claude Code is so good at docs that I sometimes investigate a new library by checking out a GitHub repo, deleting the docs/ and README and having Claude write fresh docs from scratch.

aurumque · 2026-01-06T21:59:51 1767736791

In a circuitous way, you can rather successfully have one agent write a specification and another one execute the code changes. Claude code has a planning mode that lets you work with the model to create a robust specification that can then be executed, asking the sort of leading questions for which it already seems to know it could make an incorrect assumption. I say 'agent' but I'm really just talking about separate model contexts, nothing fancy.

mikestorrent · 2026-01-06T22:40:52 1767739252

Cursor's planning functionality is very similar and I have found that I can even use "cheap" models like their Composer-1 and get great results in the planning phase, and then turn on Sonnet or Opus to actually produce the plan. 90% of the stuff I need to argue about is during the planning phase, so I save a ton of tokens and rework just making a really good spec.

It turns out that Waterfall was always the correct method, it's just really slow ;)

aurumque · 2026-01-07T17:37:43 1767807463

Did you know that software specifications used to be almost entirely flow charts? There is something to be said for that and waterfall.

cadamsdotcom · 2026-01-06T21:44:54 1767735894

Even better, have it write code to describe the right thing then run its code against that, taking yourself out of that loop.

giancarlostoro · 2026-01-06T21:58:22 1767736702

And if you've told it too many times to fix it, tell it someone has a gun to your head, for some reason it almost always gets it right this very next time.

dare944 · 2026-01-07T04:33:26 1767760406

If you're a developer at the dawn of the AI revolution, there is absolutely a gun to your head.

giancarlostoro · 2026-01-07T14:37:55 1767796675

Yeah, if anyone can truly afford the AI empire. Remember all these "leading" companies are running it at a loss, so most companies paying for it are severely underpaying the cost of it all. We would need an insane technological breakthrough of unlimited memory and power before I start to worry, and at that point, I'll just look for a new career.

jmathai · 2026-01-06T21:29:38 1767734978

I think it's worth understanding why. Because that's not everyone's experience and there's a chance you could make a change such that you find it extremely useful.

There's a lesser chance that you're working on a code base that Claude Code just isn't capable of helping with.

solumunus · 2026-01-06T21:44:05 1767735845

Correct it then, and next time craft a more explicit plan.

wubrr · 2026-01-06T21:59:57 1767736797

The more explicit/detailed your plan, the more context it uses up, the less accurate and generally functional it is. Don't get me wrong, it's amazing, but on a complex problem with large enough context it will consistently shit the bed.

rectang · 2026-01-07T01:06:04 1767747964

The human still has to manage complexity. A properly modularized and maintainable code base is much easier for the LLM to operate on — but the LLM has difficulty keeping the code base in that state without strong guidance.

Putting “Make minimal changes” in my standard prompt helped a lot with the tendency of basically all agents to make too many changes at once. With that addition it became possible to direct the LLM to make something similar to the logical progression of commits I would have made anyway, but now don’t have to work as hard at crafting.

Most of the hype merchants avoid the topic of maintainability because they’re playing to non-technical management skeptical of the importance of engineering fundamentals. But everything I’ve experienced so far working with LLMs screams that the fundamentals are more important than ever.

solumunus · 2026-01-06T22:51:23 1767739883

It usually works well for me. With very big tasks I break the plan into multiple MD files with the relevant context included and work through in individual sessions, updating remaining plans appropriately at the end of each one (usually there will be decision changes or additions during iteration).

pigpop · 2026-01-07T03:38:47 1767757127

It takes a lot of plan to use up the context and most of the time the agent doesn't need the whole plan, they just need what's relevant to the current task.

scubbo · 2026-01-07T00:56:25 1767747385

This was me. I have done a full 180 over the last 12 months or so, from "they're an interesting idea, and technically impressive, but not practically useful" to "holy shit I can have entire days/weeks where I don't write a single line of code".

littlestymaar · 2026-01-06T22:31:12 1767738672

> I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.

It's not just the deficiencies of earlier versions, but the mismatch between the praise from AI enthusiasts and the reality.

I mean maybe it is really different now and I should definitely try uploading all of my employer's IP on Claude's cloud and see how well it works. But so many people were as hyped by GPT-4 as they are now, despite GPT-4 actually being underwhelming.

Too much hype for disappointing results leads to skepticism later on, even when the product has improved.

roadside_picnic · 2026-01-06T22:58:42 1767740322

I feel similar, I'm not against the idea that maybe LLMs have gotten so much better... but I've been told this probably 10 times in the last few years working with AI daily.

The funny part about rapidly changing industries is that, despite the fomo, there's honestly not any reward to keeping up unless you want to be a consultant. Otherwise, wait and see what sticks. If this summer people are still citing the Opus 4.5 was a game changing moment and have solid, repeatable workflows, then I'll happily change up my workflow.

Someone could walk into the LLM space today and wouldn't be significantly at a loss for not having paid attention to anything that had happened in the last 4 years other than learning what has stuck since then.

baq · 2026-01-07T07:01:54 1767769314

If the trend line holds you’ll be very, very surprised.

kaydub · 2026-01-07T18:06:10 1767809170

> The funny part about rapidly changing industries is that, despite the fomo, there's honestly not any reward to keeping up unless you want to be a consultant.

LMAO what???

roadside_picnic · 2026-01-07T19:22:37 1767813757

I've lived through multiple incredibly rapid changes in tech throughout my career, and the lesson always learned was there is a lot of wasted energy keeping up.

Two big examples:

- Period from early mvc JavaScript frontends (backbone.js etc) and the time of the great React/Angular wars. I completely stepped out of the webdev space during that time period.

- The rapid expansion of Deep Learning frameworks where I did try to keep up (shipped some Lua torch packages and made minor contributions to Pylearn2).

In the first case, missing 5 years of front-end wars had zero impact. After not doing webdev work at all for 5-years I was tasked with shipping a React app. It took me a week to catch up, and everything was deployed in roughly the same time as someone would have had they spent years keeping up with changes.

In the second case, where I did keep up with many of the developing deep learning frameworks, it didn't really confer any advantage. Coworkers who I worked with who started with Pytorch fresh out of school were just as proficient, if not more so, with building models. Spending energy keeping up offered no value other than feeling "current" at the time.

Can you give me a counter example of where keeping up with a rapidly changing area that's unstable has conferred a benefit to you? Most of FOMO is really just fear. Again, unless you're trying to sell your self specifically as a consultant on the bleeding edge, there's no reason to keep up with all these changes (other than finding it fun).

kaydub · 2026-01-07T21:17:09 1767820629

You moved out of webdev for 5 years, not everybody else had that luxury. I'm sure it was beneficial to those people to keep up with webdev technologies.

recursive · 2026-01-07T21:23:50 1767821030

If everything changes every month, then stuff you learn next month would be obsolete in two months. This is a response to people saying "adapt or be left behind". There's so much thrashing that if you're not interested with the SOTA, you can just wait for everything to calm down and pick it up then.

spaceman_2020 · 2026-01-07T04:41:22 1767760882

You enter some text and a computer spits out complex answers generated on the spot

Right or wrong - doesn’t matter. You typed in a line of text and now your computer is making 3000 word stories, images, even videos based on it

How are you NOT astounded by that? We used to have NONE of this even 4 years ago!

littlestymaar · 2026-01-07T08:29:50 1767774590

Of course I'm astounded. But being spectacular and being useful are entirely different things.

spaceman_2020 · 2026-01-07T11:12:53 1767784373

If you've found nothing useful about AI so far then the problem is likely you

recursive · 2026-01-07T21:25:20 1767821120

I don't think it's necessarily a problem. And even if you accept that the problem is you, it doesn't exactly provide a "solution".

nprateem · 2026-01-07T05:31:15 1767763875

Because I want correct answers.

Kim_Bruning · 2026-01-07T13:44:06 1767793446

> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

-- Charles Babbage

troupo · 2026-01-06T22:27:49 1767738469

> Opus 4.5 really is at a new tier however. It just...works.

Literally tried it yesterday. I didn't see a single difference with whatever model Claude Code was using two months ago. Same crippled context window. Same "I'll read 10 irrelevant lines from a file", same random changes etc.

EMM_386 · 2026-01-07T00:07:01 1767744421

The context window isn't "crippled".

Create a markdown document of your task (or use CLAUDE.md), put it in "plan mode" which allows Claude to use tool calls to ask questions before it generates the plan.

When it finishes one part of the plan, have it create a another markdown document - "progress.md" or whatever with the whole plan and what is completed at that point.

Type /clear (no more context window), tell Claude to read the two documents.

Repeat until even a massive project is complete - with those 2 markdown documents and no context window issues.

troupo · 2026-01-07T08:35:49 1767774949

> The context window isn't "crippled".

... Proceeds to explain how it's crippled and all the workarounds you have to do to make it less crippled.

EMM_386 · 2026-01-07T13:37:00 1767793020

> ... Proceeds to explain how it's crippled and all the workarounds you have to do to make it less crippled.

No - that's not what I did.

You don't need an extra-long context full of irrelevant tokens. Claude doesn't need to see the code it implemented 40 steps ago in a working method from Phase 1 if it is on Phase 3 and not using that method. It doesn't need reasoning traces for things it already "thought" through.

This other information is cluttering, not helpful. It is making signal to noise ratio worse.

If Claude needs to know something it did in Phase 1 for Phase 4 it will put a note on it in the living markdown document to simply find it again when it needs it.

troupo · 2026-01-07T14:10:54 1767795054

Again, you're basically explaining how Claude has a very short limited context and you have to implement multiple workarounds to "prevent cluttering". Aka: try to keep context as small as possible, restart context often, try and feed it only small relevant information.

What I very succinctly called "crippled context" despite claims that Opus 4.5 is somehow "next tier". It's all the same techniques we've been using for over a year now.

scotty79 · 2026-01-07T14:40:47 1767796847

Context is a short term memory. Yours is even more limited and yet somehow you get by.

troupo · 2026-01-07T17:08:33 1767805713

I get by because I also have long-term memory, and experience, and I can learn. LLMs have none of that, and every new session is rebuilding the world anew.

And even my short-term memory is significantly larger than the at most 50% of the 200k-token context window that Claude has. It runs out of context before my short-term memory is probably not even 1% full, for the same task (and I'm capable of more context-switching in the meantime).

And so even the "Opus 4.5 really is at a new tier" runs into the very same limitations all models have been running into since the beginning.

scotty79 · 2026-01-07T17:25:35 1767806735

> LLMs have none of that, and every new session is rebuilding the world anew.

For LLMs long term memory is achieved by tooling. Which you discounted in your previous comments.

You also overstimate capacity of your short-term memory by few orders of magnitude:

https://my.clevelandclinic.org/health/articles/short-term-me...

troupo · 2026-01-07T17:37:19 1767807439

> For LLMs long term memory is achieved by tooling. Which you discounted in your previous comments.

My specific complaint, which is an observable fact about "Opus 4.5 is next tier": it has the same crippled context that degrades the quality of the model as soon as it fills 50%.

EMM_386: no-no-no, it's not crippled. All you have to do is keep track across multiple files, clear out context often, feed very specific information not to overflow context.

Me: so... it's crippled, and you need multiple workarounds

scotty79: After all it's the same as your own short-term memory, and <some unspecified tooling (I guess those same files)> provide long-term memory for LLMs.

Me: Your comparison is invalid because I can go have lunch, and come back to the problem at hand and continue where I left off. "Next tier Opus 4.5" will have to be fed the entire world from scratch after a context clear/compact/in a new session.

Unless, of course, you meant to say that "next tier Opus model" only has 15-30 second short term memory, and needs to keep multiple notes around like the guy from Memento. Which... makes it crippled.

scotty79 · 2026-01-07T21:18:22 1767820702

If you refuse to use what you call workarounds and I call long term memory then you end up with a guy from Memento and regardless of how smart the model is it can end up making same mistakes. And that's why you can't tell the difference between smarter and dumber one while others can.

recursive · 2026-01-07T21:27:28 1767821248

I think the premise is that if it was the "next tier" than you wouldn't need to use these workarounds.

troupo · 2026-01-07T21:57:18 1767823038

> If you refuse to use what you call workarounds

Who said I refuse them?

I evaluated the claim that Opus is somehow next tier/something different/amazeballs future at its face value. It still has all the same issues and needs all the same workarounds as whatever I was using two months ago (I had a bit of a coding hiatus between beginning of December and now).

> then you end up with a guy from Memento and regardless of how smart the model is

Those models are, and keep being the guy from memento. Your "long memory" is nothing but notes scribbled everywhere that you have to re-assemble every time.

> And that's why you can't tell the difference between smarter and dumber one while others can.

If it was "next tier smarter" it wouldn't need the exact same workarounds as the "dumber" models. You wouldn't compare the context to the 15-30 second short-term memory and need unspecified tools [1] to have "long-term memory". You wouldn't have the model behave in an indistinguishable way from a "dumber" model after half of its context windows has been filled. You wouldn't even think about context windows. And yet here we are

[1] For each person these tools will be a different collection of magic incantations. From scattered .md files to slop like Beads to MCP servers providing access to various external storage solutions to custom shell scripts to ...

BTW, I still find "superpowers" from https://github.com/obra/superpowers to be the single best improvement to Claude (and other providers) even if it's just another in a long serious of magic chants I've evaluated.

scotty79 · 2026-01-08T11:07:49 1767870469

> Those models are, and keep being the guy from memento. Your "long memory" is nothing but notes scribbled everywhere that you have to re-assemble every time.

That's exactly how the long term memory works in humans as well. The fact that some of these scribbles are done chemically in the same organ that does the processing doesn't make it much better. Human memories are reassembled at recall (often inaccurately). And humans also scribble when they try to solve a problem that exceeds their short term memory.

> If it was "next tier smarter" it wouldn't need the exact same workarounds as the "dumber" models.

This is akin to opposing calling processor next tier because it still needs RAM and bus to communicate with it and SSD as well. You think it should have everything in cache to be worthy of calling it next tier.

It's fine to have your own standards for applying words. But expect further confusion and miscommunication with other people if don't intend to realign.

troupo · 2026-01-08T13:13:05 1767877985

> That's exactly how the long term memory works in humans as well.

Where this is applicable when is you go away from a problem for a while. And yet I don't lose the entire context and have to rebuild it from scratch when I go for lunch, for example.

Models have to rebuild the entire world from scratch for every small task.

> This is akin to opposing calling processor next tier because it still needs RAM and bus to communicate with it and SSD as well.

You're so lost in your own metaphor that it makes no sense.

> You think it should have everything in cache to be worthy of calling it next tier.

No. "Next tier" implies something significantly and observably better. I don't. And here you are trying to tell me "if you use all the exact same tools that you have already used before with 'previous tier models' you will see it is somehow next tier".

If your "next tier" needs an equator-length list of caveats and all the same tools, it's not next tier is it?

BTW. I'm literally coding with this "next tier" tool with "long memory just like people". After just doing the "plan/execute/write notes" bullshit incantations I had to correct it:

    You're right, I fucked up on all three counts:

    1. FileDetails - I should have WIRED IT UP, not deleted it. 
       It's a useful feature to preview file details before playing.
       I treated "unused" as "unwanted" instead of "not yet connected".
  
    2. Worktree not merged - Complete oversight. Did all the work but
       didn't finish the job.
  
    3. _spacing - Lazy fix. Should have analyzed why it exists and either
      used it or removed the layout constraint entirely.

So next tier. So long memory. So person-like.

Oh. Within about 10 seconds after that it started compacting the "non-crippled" context window and immediately forgot most of what it had just been doing. So I had to clear out the context and teach it the world from the start again.

Edit. And now this amazing next tier model completely ignored that there already exists code to discover network interfaces, and wrote bullshit code calling CLI tools from Rust. So once again it needed to be reminded of this.

> It's fine to have your own standards for applying words. But expect further confusion and miscommunication with other people if don't intend to realign.

I mean, just like crypto bros before them, AI bros do sure love to invent their own terminology and their own realities that have nothing to do with anything real and observable.

scotty79 · 2026-01-08T17:06:28 1767891988

> "You're right, I fucked up on all three counts:"

It very well might be that AI tools are not for you, if you are getting such poor results with your methods of approaching them.

If you would like to improve your outcomes at some point, ask people who achieve better results for pointers and try them out. Here's a freebie, never tell AI it fucked up.

mikestorrent · 2026-01-06T22:42:23 1767739343

200k+ tokens is a pretty big context window if you are feeding it the right context. Editors like Cursor are really good at indexing and curating context for you; perhaps it'd be worth trying something that does that better than Claude CLI does?

troupo · 2026-01-06T22:50:54 1767739854

> a pretty big context window if you are feeding it the right context.

Yup. There's some magical "right context" that will fix all the problems. What is that right context? No idea, I guess I need to read a yet-another 20 000-word post describing magical incantations that you should or shouldn't do in the context.

The "Opus 4.5 is something else/nex tier/just works" claims in my mind means that I wouldn't need to babysit its every decision, or that it would actually read relevant lines from relevant files etc. Nope. Exact same behaviors as whatever the previous model was.

Oh, and that "200k tokens context window"? It's a lie. The quality quickly degrades as soon as Claude reaches somewhere around 50% of the context window. At 80+% it's nearly indistinguishable from a model from two years ago. (BTW, same for Codex/GPT with it's "1 million token window")

theshrike79 · 2026-01-07T01:09:59 1767748199

It's like working with humans:

  1) define problem
  2) split problem into small independently verifiable tasks
  3) implement tasks one by one, verify with tools

With humans 1) is the spec, 2) is the Jira or whatever tasks

With an LLM usually 1) is just a markdown file, 2) is a markdown checklist, Github issues (which Claude can use with the `gh` cli) and every loop of 3 gets a fresh context, maybe the spec from step 1 and the relevant task information from 2

I haven't ran into context issues in a LONG time, and if I have it's usually been either intentional (it's a problem where compacting wont' hurt) or an error on my part.

troupo · 2026-01-07T09:00:09 1767776409

> every loop of 3 gets a fresh context, maybe the spec from step 1 and the relevant task information from 2

> I haven't ran into context issues in a LONG time

Because you've become the reverse centaur :) "a person who is serving as a squishy meat appendage for an uncaring machine." [1]

You are very aware of the exact issues I'm talking about, and have trained yourself to do all the mechanical dance moves to avoid them.

I do the same dances, that's why I'm pointing out that they are still necessary despite the claims of how model X/Y/Z are "next tier".

[1] https://doctorow.medium.com/https-pluralistic-net-2025-12-05...

theshrike79 · 2026-01-07T09:45:10 1767779110

Yes and no. I've worked quite a bit with juniors, offshore consultants and just in companies where processes are a bit shit.

The exact same method that worked for those happened to also work for LLMs, I didn't have to learn anything new or change much in my workflow.

"Fix bug in FoobarComponent" is enough of a bug ticket for the 100x developer in your team with experience with that specific product, but bad for AI, juniors and offshored teams.

Thus, giving enough context in each ticket to tell whoever is working on it where to look and a few ideas what might be the root cause and how to fix it is kinda second nature to me.

Also my own brain is mostly neurospicy mush, so _I_ need to write the context to the tickets even if I'm the one on it a few weeks from now. Because now-me remembers things, two-weeks-from-now me most likely doesn't.

troupo · 2026-01-07T10:42:22 1767782542

The problem with LLMs (similar to people :) ) is that you never really know what works. I've had Claude one-shot "implement <some complex requirement>" with little additional input, and then completely botch even the smallest bug fix with explicit instructions and context. And vice versa :)

CuriouslyC · 2026-01-06T23:22:01 1767741721

I realize your experience has been frustrating. I hope you see that every generation of model and harness is converting more hold-outs. We're still a few years from hard diminishing returns assuming capital keeps flowing (and that's without any major new architectures which are likely) so you should be able to see how this is going to play out.

It's in your interest to deal with your frustration and figure out how you can leverage the new tools to stay relevant (to the degree that you want to).

Regarding the context window, Claude needs thinking turned up for long context accuracy, it's quite forgetful without thinking.

troupo · 2026-01-07T08:39:19 1767775159

Note how nothing in your comment addresses anything I said. Except the last sentence that basically confirms what I said. This perfectly illustrates the discourse around AI.

As for the snide and patronizing "it's in your interest to stay relevant":

1. I use these tools daily. That's why I don't subscribe to willful wide-eyed gullibility. I know exactly what these tools can and cannot do.

The vast majority of "AI skeptics" are the same.

2. In a few years when the world is awash in barely working incomprehensible AI slop my skills will be in great demand. Not because I'm an amazing developer (I'm not), but because I have experience separating wheat from the chaff

CuriouslyC · 2026-01-07T12:39:05 1767789545

The snide and patronizing is your projection. It kinda makes me sad when the discourse is so poisoned that I can't even encourage someone to protect their own future from something that's obviously coming (technical merits aside, purely based on social dynamics).

It seems the subject of AI is emotionally charged for you, so I expect friendly/rational discourse is going to be a challenge. I'd say something nice but since you're primed to see me being patronizing... Fuck you? That what you were expecting?

troupo · 2026-01-07T13:12:36 1767791556

> The snide and patronizing is your projection.

It's not me who decided to barge in, assume their opponent doesn't use something or doesn't want to use something, and offer unsolicited advice.

> It kinda makes me sad when the discourse is so poisoned that I can't even encourage someone to protect their own future from something that's obviously coming

See. Again. You're so in love with your "wisdom" that you can't even see what you sound like: snide, patronising, condenscending. And completely missing the whole point of what was written. You are literally the person who poisons the discourse.

Me: "here are the issues I still experience with what people claim are 'next tier frontier model'"

You: "it's in your interests to figure out how to leverage new tools to stay relevant in the future"

Me: ... what the hell are you talking about? I'm using these tools daily. Do you have anything constructive to add to the discourse?

> so I expect friendly/rational discourse is going to be a challenge.

It's only challenge to you because you keep being in love with your voice and your voice only. Do you have anything to contribute to the actual rational discourse, are you going to attack my character?

> 'd say something nice but since you're primed to see me being patronizing... Fuck you? T

Ah. The famous friendly/rational discourse of "they attack my use of AI" (no one attacked you), "why don't you invest in learning tools to stay relevant in the future" (I literally use these tools daily, do you have anything useful to say?) and "fuck you" (well, same to you).

> That what you were expecting?

What I was expecting is responses to what I wrote, not you riding in on a high horse.

CuriouslyC · 2026-01-07T14:19:01 1767795541

You were the one complaining about how the tools aren't giving you the results you expected. If you're using these tools daily and having a hard time, either you're working on something very different from the bulk of people using the tools and your problems or legitimate, or you aren't and it's a skill issue.

If you want to take politeness as being patronizing, I'm happy to stop bothering. My guess is you're not a special snowflake, and you need to "get good" or you're going to end up on unemployment complaining about how unfair life is. I'd have sympathy but you don't seem like a pleasant human being to interact with, so have fun!

troupo · 2026-01-07T17:08:29 1767805709

> ou were the one complaining about how the tools aren't giving you the results you expected.

They are not giving me the results people claim they give. It is distinctly different from not giving the results I want.

> If you're using these tools daily and having a hard time, either you're working on something very different from the bulk of people using the tools and your problems or legitimate, or you aren't and it's a skill issue.

Indeed. And your rational/friendly discourse that you claim you're having would start with trying to figure that out. Did you? No, you didn't. You immediately assumed your opponent is a clueless idiot who is somehow against AI and is incapable or learning or something.

> If you want to take politeness as being patronizing, I'm happy to stop bothering.

No. It's not politeness. It's smugness. You literally started your interaction in this thread with a "git gud or else" and even managed to complain later that "you dislike it when they attack your use of AI as a skill issue". While continuously attacking others.

> you don't seem like a pleasant human being to interact with

Says the person who has contributed nothing to the conversation except his arrogance, smugness, holier-than-thou attitude, engaged in nothing but personal attacks, complained about non-existent grievances and when called out on this behavior completed his "friendly and rational discourse" with a "fuck you".

Well, fuck you, too.

Adieu.

th0ma5 · 2026-01-07T01:48:52 1767750532

I think it's important for people who want to write a comment like this to understand how much this sounds like you're in a cult.

CuriouslyC · 2026-01-07T02:30:33 1767753033

Personally I'm sympathetic to people who don't want to have to use AI, but I dislike it when they attack my use of AI as a skill issue. I'm quite certain the workplace is going to punish people who don't leverage AI though, and I'm trying to be helpful.

troupo · 2026-01-07T08:41:33 1767775293

> but I dislike it when they attack my use of AI as a skill issue.

No one attacked your use of AI. I explained my own experience with the "Claude Opus 4.5 is next tier". You barged in, ignored anything I said, and attacked my skills.

> the workplace is going to punish people who don't leverage AI though, and I'm trying to be helpful.

So what exactly is helpful in your comments?

CuriouslyC · 2026-01-07T12:33:20 1767789200

The only thing I disagreed with in your post is your objectively incorrect statement regarding Claude's context behavior. Other than that I'm just trying to encourage you to make preparations for something that I don't think you're taking seriously enough yet. No need to get all worked up, it'll only reflect on you.

mikestorrent · 2026-01-09T19:35:52 1767987352

And, conversely, when we read a comment like yours, it sounds like someone who's afraid of computers, would maybe have decried the bicycle and automobile, and really wishes they could just go live in a cabin in the woods.

(And it's fine to do so, just don't mail bombs to us, ok?)

pigeons · 2026-01-08T02:39:24 1767839964

It certainly sounds unkind, if not cultish.

mikestorrent · 2026-01-09T19:39:47 1767987587

> There's some magical "right context" that will fix all the problems.

All I can tell you is that in my own lived experience, I've had some fantastic results from AI, and it comes from telling it "look at this thing here, ok, i want you to chain it to that, please consider this factor, don't forget that... blah blah blah" like how I would have spelled things out to a junior developer, and then it really does stand a really solid chance of turning out what I've asked for. It helps a lot that I know what to ask for; there's no replacing that with AI yet.

So, your own situation must fall into one of these coarse buckets:

- You're doing something way too hard for AI to have a chance at yet, like real science / engineering at the frontier, not just boring software or infra development

- Your prompts aren't specific enough, you're not feeding it context, and you're expecting it to one-shot things perfectly instead of having to spend an afternoon prompting and correcting stuff

- You're not actually using and getting better at the tools, so you're just shouting criticisms from the sidelines, perhaps as sour grape because you're not allowed by policy / company can't afford to have you get into it.

IDK. I hope it's the first one and you're just doing Really Hard Things, but if you're doing normal software developer stuff and not seeing a productivity advantage, it's a fucking skill issue.

pluralmonad · 2026-01-07T18:57:43 1767812263

I'm not familiar with any form of intelligence that does not suffer from a bloated context. If you want to try and improve your workflow, a good place to start is using sub-agents so individual task implementations do not fill up your top level agents context. I used to regularly have to compact and clear, but since using sub-agents for most direct tasks, I hardly do anymore.

troupo · 2026-01-07T20:52:50 1767819170

1. It's a workaround for context limitations

2. It's the same workarounds we've been doing forever

3. It's indistinguishable from "clear context and re-feed the entire world of relevant info from scratch" we've had forever, just slightly more automated

That's why I don't understand all the "it's new tier" etc. It's all the same issues with all the same workarounds.

iwontberude · 2026-01-06T22:44:36 1767739476

I use Sonnet and Opus all the time and the differences are almost negligible

llmslave2 · 2026-01-07T06:52:13 1767768733

That's because Opus has been out for almost 5 months now lol. Its the same model, so I think people have been vibe coding with a heavy dose of wine this holiday and are now convinced its the future.

Leynos · 2026-01-07T10:29:47 1767781787

Opus 4.5 was released 24th November.

spaceman_2020 · 2026-01-07T11:13:47 1767784427

Looks like you hallucinated the Opus release date

Are you sure you're not an LLM?

llmslave2 · 2026-01-07T20:53:03 1767819183

Opus 4.1 was released in August or smth.

iwontberude · 2026-01-06T22:43:33 1767739413

Opus 4.5 is fucking up just like Sonnet really. I don't know how your use is that much different than mine.

biammer · 2026-01-06T19:30:21 1767727821

[flagged]

keeda · 2026-01-06T21:18:51 1767734331

Actually, I've been saying that even models from 2+ years ago were extremely good, but you needed to "hold them right" to get good results, else you might cut yourself on the sharp edges of the "jagged frontier" (https://www.hbs.edu/faculty/Pages/item.aspx?num=64700) Unfortunately, this often necessitated you to adapt yourself to the tool, which is a big change -- unfeasible for most people and companies.

I would say the underlying principle was ensuring a tight, highly relevant context (e.g. choose the "right" task size and load only the relevant files or even code snippets, not the whole codebase; more manual work upfront, but almost guaranteed one-shot results.)

With newer models the sharper edges have largely disappeared, so you can hold them pretty much any which way and still get very good results. I'm not sure how much of this is from the improvements in the model itself vs the additional context it gets from the agentic scaffolding.

I still maintain that we need to adapt ourselves to this new paradigm to fully leverage AI-assisted coding, and the future of coding will be pretty strange compared to what we're used to. As an example, see Gas Town: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

CuriouslyC · 2026-01-06T23:30:32 1767742232

FWIW, Gas Town is strange because Steve is strange (in a good way).

It's just the same agent swarm orchestration that most agent frameworks are using, but with quirky marketing. All of that is just based on the SDLC [PM/Architect -> engineer planning group -> engineer -> review -> qa/evaluation] loop most people here should be familiar with. So actually pretty banal, which is probably part of the reason Steve decided to be zany.

keeda · 2026-01-07T05:07:28 1767762448

Ah, gotcha, I am still working through the article, but its detailed focus on all the moving parts under the covers is making it hard to grok the high-level workflow.

QuantumGood · 2026-01-06T20:55:07 1767732907

Each failed prediction should lower our confidence in the next "it's finally useful!" claim. But this inductive reasoning breaks down at genuine inflection points.

I agree with your framing that measuring should NOT be separated from political issues, but each can be made clear separately (framing it as "training the tools of the oppressor" seems to conflate measuring tool usefulness with politics).

biammer · 2026-01-06T21:29:26 1767734966

[flagged]

mikestorrent · 2026-01-06T22:53:35 1767740015

> How is it useful to you that these companies are so valuation hungry that they are moving money into this technology in such a way that people are fearful it could cripple the entire global economy?

The creation of entire new classes of profession has always been the result of technological breakthroughs. The automobile did not cripple the economy, even as it ended the buggy-whip barons.

> How is it useful to you that this tech is so power hungry that environmental externalities are being further accelerated while regular people's utility costs are raising to cover the increased demand(whether they use the tech to "code" or "manifest art")?

There will be advantages to lower-power computing, and lower-cost electricity. Implement carbon taxes and AI companies will follow the market incentive to install their datacentres in places where sustainable power is available for cheap. We'll see China soaring to new heights with their massive solar investment, and America will eventually figure out they have to catch up and cannot do so with coal and gas.

> How is it useful to you that this tech is so compute hungry that they are seemingly ending the industry of personal compute to feed this tech's demand?

Temporary problem, the demand for personal computing is not going to die in five years, and meanwhile the lucrative markets for producing this equipment will result in many new factories, increasing capacity and eventually lowering prices again. In the meantime, many pundits are suggesting that this may thankfully begin the end of the Electron App Era where a fuckin' chat client thinks it deserves 1GB of RAM.

Consider this: why are we using Electron and needing 32GB of RAM on a desktop? Because web developers only knew how to use Javascript and couldn't write a proper desktop app. With AI, desktop frameworks can have a resurgence; why shouldn't I use Go or Rust and write a native app on all platforms now that the cost of doing so is decreasing and the number of people empowered to work with it is increasing? I wrote a nice multithreaded fractal renderer in Rust the other day; I don't know how to multithread, write Rust, and probably can't iterate complex numbers correctly on paper anymore....

> How is it useful to you that this tech is so water hungry that it is emptying drinking water acquifers?

This is only a problem in places that have poor water policy, e.g. California (who can all thank the gods that their reservoirs are all now very full from the recent rain). This problem predates datacenters and needs to be solved - for instance, by federalizing and closing down the so-called Wonderful Company and anyone else who uses underhanded tactics to buy up water rights to grow crops that shouldn't be grown there.

Come and run your datacenters up in the cold North, you won't even need evaporative cooling for them, just blow a ton of fresh air in....

> How is it useful to you that this tech is being used to manufacture consent?

Now you've actually got an argument, and I am on your side on this one.

ben_w · 2026-01-06T22:19:38 1767737978

> If at any point any of these releases were "genuine inflection points" it would be unnecessary to proselytize such. It would be self evident. Much like rain.

Agreed.

Now, I suggest reading through all of this to note that I am not a fan of tech bros, that I do want this to be a bubble. Then also note what else I'm saying despite all that.

To me, it is self-evident. The various projects I have created by simply asking for them, are so. I have looked at the source code they produce, and how this has changed over time: Last year I was describing them as "junior" coders, by which I meant "fresh hire"; now, even with the same title, I would say "someone who is just about to stop being a junior".

> "The oppressed need to acknowledge that their oppression is useful to their oppressors."

The capacity for AI to oppress you is in direct relation to its economic value.

> How is it useful to you that this tech is so power hungry that environmental externalities are being further accelerated while regular people's utility costs are raising to cover the increased demand(whether they use the tech to "code" or "manifest art")?

The power hunger is in direct proportion to the demand. Someone burning USD 20 to get Claude Code tokens has consumed approximately USD 10 of electricity in that period, with the other USD 10 having been spread between repaying the model training cost and the server construction cost.

The reason they're willing to spend USD 20 is to save at least US 20 worth of dev time. This was already the case with the initial version of ChatGPT pro back in the day, when it could justify that by saving 23 dev minutes per month. There's around a million developers in the USA, just that group increasing electricity spending by USD 10/month will put a massive dent on the USA's power grid.

Gets worse though. Based on my experience, using Claude Code optimally, when you spend USD 20 you get at least 10 junior sprints' worth of output. Hiring a junior for 10 sprints is, what, USD 30,000? The bound here is "are you able to get value from having hired 1,500 juniors for the price of one?"

One can of course also waste those tokens. Both because nobody needs slop, and because most people can't manage one junior never mind 1500 of them.

However, if the economy collectively answers "yes", then the environmental externalities expand until you can't afford to keep your fridge cold or your lights on.

This is one of the failure modes of the technological singularity that people like me have been forewarning about for years, even when there's no alignment issues within the models themselves. Which there are, because Musk's one went and called itself Mecha Hitler, while being so sycophantic about Musk himself that it called him the best at everything even when the thing was "drinking piss", which would be extremely funny if he wasn't selling this to the US military.

> How is it useful to you that this tech is so compute hungry that they are seemingly ending the industry of personal compute to feed this tech's demand?

This will pass. Either this is a bubble, it pops, the manufacturers return to their roots; or it isn't because it works as advertised, which means it leads to much higher growth rates, and we (us, personally, you and me) get personal McKendree cylinders each with more compute than currently exists… or we get turned into the raw materials for those cylinders.

I assume the former. But I say that as one who wants it to be the former.

> How is it useful to you that this tech is so water hungry that it is emptying drinking water acquifers?

Is it what's emptying drinking water acquifers?

The combined water usage of all data centers in Arizona. All of them. Together. Which is over 100 DCs. All of them combined use about double what Tesla was expecting from just the Brandenburg Gigafactory to use before Musk decided to burn his reputation with EV consumers and Europeans for political point scoring.

> How is it useful to you that this tech is being used to manufacture consent?

This is one of the objectively bad things, though it's hard to say if this is more or less competent at this than all the other stuff we had three years ago, given the observed issues with the algorithmic feeds.

biammer · 2026-01-06T22:57:56 1767740276

I appreciate you taking the time to write up your thoughts on something other than exclusively these tools 'usefulness' at writing code.

> The capacity for AI to oppress you is in direct relation to its economic value.

I think this assumes a level of rationality in these systems, corporate interests and global markets, that I would push back on as being largely absent.

> The power hunger is in direct proportion to the demand.

Do you think this is entirely the case? I mean, I understand what you are saying, but I would draw stark lines between "company" demand versus "user" demand. I have found many times the 'AI' tools are being thrust into nearly everything regardless of user demand. Spinning its wheels to only ultimately cause frustration. [0]

> Is it what's emptying drinking water aquifers?

It appears this is a problem, and will only continue to be such. [1]

> The combined water usage of all data centers in Arizona. All of them. Together. Which is over 100 DCs. All of them combined use about double what Tesla was expecting from just the Brandenburg Gigafactory to use before Musk decided to burn his reputation with EV consumers and Europeans for political point scoring.

I am unsure if I am getting what your statements here are trying to say. Would you be able to restate this to be more explicit in what you are trying to communicate.

[0] https://news.ycombinator.com/item?id=46493506

[1] https://www.forbes.com/sites/cindygordon/2024/02/25/ai-is-ac...

ben_w · 2026-01-06T23:59:31 1767743971

> I think this assumes a level of rationality in these systems, corporate interests and global markets, that I would push back on as being largely absent.

Could be. What I hope and suspect is happening is that these companies are taking a real observation (the economic value that I also observe in software) and falsely expanding this to other domains.

Even to the extent that these work, AI has clearly been over-sold in humanoid robotics and self-driving systems, for example.

> Do you think this is entirely the case? I mean, I understand what you are saying, but I would draw stark lines between "company" demand versus "user" demand. I have found many times the 'AI' tools are being thrust into nearly everything regardless of user demand. Spinning its wheels to only ultimately cause frustration. [0]

I think it is. Companies setting silly goals like everyone must use LLMs once a day or whatever, that won't burn a lot of tokens. Claude Code is available in both subscription mode and PAYG mode, and the cost of subscriptions suggests it is burning millions of tokens a month for the basic subscription.

Other heavy users who we would both agree are bad, are slop content farms. I cannot even guesstimate those, so would be willing to accept the possibility they're huge.

> It appears this is a problem, and will only continue to be such. [1]

I find no reference to "aquifers" in that.

Where it says e.g. "up to 9 liters of water to evaporate per kWh of energy used", the average is 1.9 l/kWh. Also, evaporated water tends to fall nearby (on this scale) as rain, so unless there's now too much water on the surface, this isn't a net change even if it all comes form an aquifer (and I have yet to see any evidence of DCs going for that water source).

It says "The U.S. relies on water-intensive thermoelectric plants for electricity, indirectly increasing data centers' water footprint, with an average of 43.8L/kWh withdrawn for power generation." - most water withdrawn is returned, not consumed.

It says "Already AI's projected water usage could hit 6.6 billion m³ by 2027, signaling a need to tackle its water footprint.", this is less than the famously-a-desert that is Arizona.

> I am unsure if I am getting what your statements here are trying to say. Would you be able to restate this to be more explicit in what you are trying to communicate.

That the water consumption of data centres is much much smaller than the media would have you believe. It's more of a convenient scare story than a reality. If water is your principal concern, give up beef, dairy, cotton, rice, almonds, soy, biofuels, mining, paper, steel, cement, residential lawns, soft drinks, car washing, and hospitals, in approximately that order (assuming the lists I'm reading those from are not invented whole cloth), before you get to data centres.

And again, I don't disagree that they're a problem, it's just that the "water" part of the problem is so low down the list of things to worry about as to be a rounding error.

biammer · 2026-01-07T00:32:26 1767745946

> I find no reference to "aquifers" in that.

Ahh, I see your objection now. That is my bad. I was using my language too loosely. Here I was using 'aquifer' to mean 'any source of drinking water', but that is certainly different from the intended meaning.

> And again, I don't disagree that they're a problem, it's just that the "water" part of the problem is so low down the list of things to worry about as to be a rounding error.

I'm skeptical of the rounding error argument, and weary of relying on the logical framework of 'low down the list' when list items' effects stack interdependently.

> give up beef, dairy, cotton, rice, almonds, soy, biofuels, mining, paper, steel, cement, residential lawns, soft drinks, car washing, and hospitals

In part due to this reason, as well as others, I have stopped directly supporting the industries for: beef, dairy, rice, almonds, soy, biofuels, residential lawns, soft drinks, car washing

QuantumGood · 2026-01-06T21:44:17 1767735857

The hype curve is a problem, but it's difficult to prevent. I myself have never made such a prediction. Though it now seems that the money and effort to create working coding tools is near an inflection point.

"It would be self evident." History shows the opposite at inflection points. The "self evident" stage typically comes much later.

spaceman_2020 · 2026-01-06T21:42:54 1767735774

It's a little weird how defensive people are about these tools. Did everyone really think being able to import a few npm packages, string together a few APIs, and run npx create-react-app was something a large number of people could do forever?

The vast majority of coders in employment barely write anything more complex than basic CRUD apps. These jobs were always going to be automated or abstracted away sooner or later.

Every profession changes. Saying that these new tools are useless or won't impact you/xyz devs is just ignoring a repeated historical pattern

stefan_ · 2026-01-06T22:52:55 1767739975

They made the "abstracted away the CRUD app", it's called Salesforce. Hows that going?

simonw · 2026-01-06T23:47:50 1767743270

It's employing so may people who specialize in Salesforce configuration that every year San Francisco collapses under the weight of 50,000+ of them attending Dreamforce.

And it's actually kind of amazing, because a lot of people who earn six figures programming Salesforce came to it from a non-traditional software engineering background.

mikestorrent · 2026-01-06T22:45:02 1767739502

I think perhaps for some folks we're looking at their first professional paradigm shift. If you're a bit older, you've seen (smaller versions of) the same thing happening before as e.g. the Internet gained traction, Web2.0, ecommerce, crypto, etc. and have seen your past skillset become useless as now it can be accomplished for only $10/mo/user.... either you pivot and move on somehow, or you become a curmudgeon. Truly, the latter is optional, and at any point when you find yourself doing that you wish to stop and just embrace the new thing, you're still more than welcome to do so. AI is only going to get EASIER to get involved with, not harder.

wiml · 2026-01-06T23:38:00 1767742680

And by the same token (ha) for some folks we're looking at their first hype wave. If you're a bit older, you've seen similar things like 4GLs and visual programming languages and blockchain and expert systems. They each left their mark on our profession but most of their promises were unfounded and ultimately unrealized.

mikestorrent · 2026-01-09T19:34:14 1767987254

I like a lot of 4GL ideas. Closest I've come was working on ServiceNow which is sort of a really powerful system with ugly, ugly roots but the idea of your code being the database being the code really resonated with me, as a self-taught programmer.

Similarly, Lisp's homoiconicity makes sense to me as a wonderfully aesthetic idea. I remember generating strings-of-text that were code, but still just text, and wishing that I could trivially step into the structure there like it was a map/dict... without realizing that that's what an AST is and what the language compiler / runtime is already always doing.

troupo · 2026-01-06T22:58:43 1767740323

Lol. In a few years when the world is awash in AI-generated slop [1] my "past skills" will not only be relevant, they will be actively sought after.

[1] Like the recent "Gas Town" and "Beads" that people keep mentioning in the comments that require extensive scripts/human intervention to purge from the system: https://news.ycombinator.com/item?id=46510121

mikestorrent · 2026-01-09T19:34:49 1767987289

I'm probably the same age as you, and similarly counting on past skills - it's what lets me use AI to produce things that aren't slop.

idiotsecant · 2026-01-06T22:09:11 1767737351

Agreed, it always seemed a little crazy that you could make wild amounts of money to just write software. I think the music is finally stopping and we'll all have to go back to actually knowing how to do something useful.

ben_w · 2026-01-06T23:05:26 1767740726

> The vast majority of coders in employment barely write anything more complex than basic CRUD apps. These jobs were always going to be automated or abstracted away sooner or later.

My experience has been negative progress in this field. On iOS, UIKit in Interface Builder is an order of magnitude faster to write and to debug, with less weird edge cases, than SwiftUI was last summer. I say last summer because I've been less and less interested in iOS the more I learn about liquid glass, even ignoring the whole "aaaaaaa" factor of "has AI made front end irrelevant anyway?" and "can someone please suggest something the AI really can't do so I can get a job in that?"

marcosdumay · 2026-01-07T00:55:05 1767747305

The 80s TUI frameworks are still not beaten in developer productivity buy GUI or web frameworks. They have been beaten by GUIs in usability, but then the GUIs reverted into a worse option.

Too bad they were mostly proprietary and won't even run in modern hardware.

square_usual · 2026-01-06T21:25:27 1767734727

You're free to not open these threads, you know!

Workaccount2 · 2026-01-06T20:25:49 1767731149

Democratizing coding so regular people can get the most out of computers is the opposite of oppression. You are mistaking your interests for societies interests.

It's the same with artists who are now pissed that regular people can manifest their artistic ideas without needing to go through an artist or spend years studying the craft. The artists are calling the AI companies oppressors because they are breaking the artist's stranglehold on the market.

It's incredibly ironic how socializing what was a privatized ability has otherwise "socialist" people completely losing their shit. Just the mask of pure virtue slipping...

deergomoo · 2026-01-06T22:02:56 1767736976

On what planet is concentrating an increasingly high amount of the output of this whole industry on a small handful of megacorps “democratising” anything?

Software development was already one of the most democratised professions on earth. With any old dirt cheap used computer, an internet connection, and enough drive and curiosity you could self-train yourself into a role that could quickly become a high paying job. While they certainly helped, you never needed any formal education or expensive qualifications to excel in this field. How is this better?

Workaccount2 · 2026-01-06T22:46:26 1767739586

Open/local models are available.

Maybe not as good, but they can certainly do far far more than what was available a few years ago.

bsder · 2026-01-06T23:22:53 1767741773

The open models don't have access to all the proprietary code that the closed ones have trained on.

That's primarily why I finally had to suck it up and sign up for Claude. Claude clearly can cough up proprietary codebase examples that I otherwise have no access to.

simonw · 2026-01-06T23:25:19 1767741919

Given that very few of the "open models" disclose their training data there's no reason at all to assume that the proprietary models have an advantage in terms of training on proprietary data.

As far as I can tell the reason OpenAI and Anthropic are ahead in code is that they've invested extremely heavily in figuring out the right reinforcement learning training mix needed to get great coding results.

Some of the Chinese open models are already showing signs of catching up.

simonw · 2026-01-06T23:21:21 1767741681

It's better because now you can automate something tedious in your life with a computer without having to first climb a six month learning curve.

biammer · 2026-01-07T01:27:01 1767749221

> deergomoo: On what planet is concentrating an increasingly high amount of the output of this whole industry on a small handful of megacorps “democratising” anything?

> simonw: It's better because now you can automate something tedious in your life with a computer without having to first climb a six month learning curve.

Completely ignores, or enthusiastically accepts and endorses, the consolidation of production, power, and wealth into a stark few (friends), and claims superiority and increased productivity without evidence?

This may be the most simonw comment I have ever seen.

simonw · 2026-01-07T07:10:50 1767769850

At the tail end of 2023 I was deeply worried about consolidation of power, because OpenAI were the only lab with a GPT-4 class model and none of their competitions had produced anything that matched it in the ~8 months since it had launched.

I'm not worried about that at all any more. There are dozens of organizations who have achieved that milestone now, and OpenAI aren't even definitively in the lead.

A lot of those top-class models are open weight (mainly thanks to the Chinese labs) and available for people to run on their own hardware.

I wrote a bunch more about this in my 2024 wrap-up: https://simonwillison.net/2024/Dec/31/llms-in-2024/#the-gpt-...

spaceman_2020 · 2026-01-06T21:45:47 1767735947

I used claude code to set up a bunch of basic tools my wife was using in her daily work. Things like custom pomodoro timers, task managers, todo notes.

She used to log into 3 different websites. Now she just opens localhost:3000 and has all of them on the same page. No emails shared with anyone. All data stored locally.

I could have done this earlier but the time commitment with Claude Code now was writing a spec in 5-minutes and pressing approve a few times vs half a day.

I count this as an absolute win. No privacy breaches, no data sharing.

spacechild1 · 2026-01-07T01:33:31 1767749611

> The artists are calling the AI companies oppressors because they are breaking the artist's stranglehold on the market.

Tt's because these companies profit from all the existing art without compensating the artists. Even worse, they are now putting the very people out of a job who (unwittingly) helped to create these tools in the first place. Not to mention how hurtful it must be for artists seeing their personal style imitated by a machine without their consent.

I totally see how it can empower regular people, but it also empowers the megacorps and bad actors. The jury is still out on whether AI is providing a net positive to society. Until then, let's not ignore the injustice and harm that went into creating these tools and the potential and real dangers that come with it.

biammer · 2026-01-06T21:10:49 1767733849

When you imagine my position, "I hate these companies for democratizing code/art", then debate that it is called a strawman logical fallacy.

Ascribing the goals of "democratize code/art" onto these companies and their products is called delusion.

I am sure the 3 letter agency directors on these company boards are thrilled you think they left their lifelong careers solely to finally realize their dream to allow you to code and "manifest your artistic ideas".

Workaccount2 · 2026-01-06T22:49:02 1767739742

Again, open models exist. These companies don't have a monopoly on the tech and they know it.

So maybe celebrate open/private/local models for empowering people rather than selfishly complain about it?

icedchai · 2026-01-07T00:04:12 1767744252

Yes, but the quality of output from open/local models isn't anywhere close to what you get from Claude or Gemini. You need serious hardware to get anything approaching decent processing speeds or even middling quality.

It's more economical for the average person to spend $20/month on a subscription than it is for them to drop multiple thousands $ and untold hours of time experimenting. Local AI is a fun hobby though.

elzbardico · 2026-01-06T23:16:26 1767741386

But people are not creating anything. They are just asking a computer to remix what other people created.

It's incredibly ironic how blatant theft has left otherwise capitalistic people so enthusiastic.

Aurornis · 2026-01-06T21:27:33 1767734853

> If I am unable to convince you to stop meticulously training the tools of the oppressor (for a fee!) then I just ask you do so quietly.

I'm kind of fascinated by how AI has become such a culture war topic with hyperbole like "tools of the oppressor"

It's equally fascinating how little these comments understand about how LLMs work. Using an LLM for inference (what you do when you use Claude Code) does not train the LLM. It does not learn from your code and integrate it into the model while you use it for inference. I know that breaks the "training the tools of the oppressor" narrative which is probably why it's always ignored. If not ignored, the next step is to decry that the LLM companies are lying and are stealing everyone's code despite saying they don't.

meowkit · 2026-01-06T21:36:01 1767735361

We are not talking about inference.

The prompts and responses are used as training data. Even if your provider allows you to opt out they are still tracking your usage telemetry and using that to gauge performance. If you don’t own the storage and compute then you are training the tools which will be used to oppress you.

Incredibly naive comment.