I've been programming for literally my entire life. I love it, it's part of me, and there hasn't been more than a week in 30 years that I haven't written some code.
This is the first time that I feel a level of anxiety when I am not actively doing it. What a crazy shift that I am still so excited and enamored by the process after all of this time.
But there's also the double edged sword. I am also having a really hard time moderating my working hours, which I naturally struggle with anyway, even more. Partly because I am having so much fun and being so productive. But also because it's just so tempting to add 1 more feature, fix one more bug.
None in my area. Time to disperse. Get out of major cities like the pandemic promised. Fill in this great country we live in. Proliferate the governments surveillance for them.
AI Agent's like Claude Code are an arms race to the bottom. Just like frontier model quality, they all converge on feature sets over time (plan mode, skills, remote execution, sandboxing, etc.,) and opencode is holding its own, preferred even, in a lot of cases.
The real differentiated value comes from the environment the AI Agent operates in, the runtime.
The runtime is agent agnostic but provides a stable interface to your domain. People tried this with MCP, but MCP is a dead end. Local tool calling is so much better. Being able to extend integrations autonomously is the way, instead of being forced in to a bloated bag of tools.
This is why we built swamp - https://swamp.club. We can integrate with any technology with an API, CLI, or code base and build repeatable, typed, validated automation workflows. There are no providers to wait for. No weird tool call paths trying to get the right concoction of MCP. The agent builds the integration itself, on the spot, in minutes.
It serves UK, EU, and various US States' regulations to "protect the kids".
Discord is only the next biggest canary in the coal mine. These regulations are going to force a lot more websites and apps to do this, too.
I wish these sorts of regulations had been written hand-in-hand with a more directly technically-minded approach. The world needs a better technical way to try to verify a person's estimated age cohort without a full ID check and/or AI-analyzed video face scan before we start regulating "every" website that may post "adult content" (however you choose to define that) starts to require such checks.
Math education has always been a failure, or a "crisis." The number of people who come out of school with any functional math ability has been fairly constant over the decades, and depends a lot on family background and economic class. I'm not even sure that differences across countries are all that significant when people reach adulthood.
Don't get me wrong. I was one of the successful ones, but I think math education is in need of reform. In fact I would reform it quite radically.
You write a generic architecture document on how you want your code base to be organized, when to use pattern x vs pattern y, examples of what that looks like in your code base, and you encode this as a skill.
Then, in your prompt you tell it the task you want, then you say, supervise the implementation with a sub agent that follows the architecture skill. Evaluate any proposed changes.
There are people who maximize this, and this is how you get things like teams. You make agents for planning, design, qa, product, engineering, review, release management, etc. and you get them to operate and coordinate to produce an outcome.
That's what this is supposed to be, encoded as a feature instead of a best practice.
Aren't you just moving the problem a little bit further? If you can't trust it will implement carefully specified features, why would you believe it would properly review those?
It's hard to explain, but I've found LLMs to be significantly better in the "review" stage than the implementation stage.
So the LLM will do something and not catch at all that it did it badly. But the same LLM asked to review against the same starting requirement will catch the problem almost always
The missing thing in these tools is that automatic feedback loop between the two LLMs: one in review mode, one in implementation mode.
Anecdotaly I think this is in Claude Code. It's pretty frequent to see it implement something, then declare it "forgot" a requirement and go back and alter or add to the implementation.
AFAICT this is already baked into the GitHub Copilot agent. I read its sessions pretty often and reviewing/testing after writing code is a standard part of its workflow almost every time. It's kind of wild seeing how diligent it is even with the most trivial of changes.
It _does_ use up tokens incredibly fast, which is probably why Anthropic is developing this feature. This is mostly for corporations using the API, not individuals on a plan.
I'd love to see a breakdown of the token consumption of inaccurate/errored/unused task branches for claude code and codex. It seems like a great revenue source for the model providers.
Yeah, that's what I was thinking. They do have an incentive to not get everything right on the first try, as long as they don't over do it... I also feel like that they try to get more token usage by asking unnecesary follow up questions that the user may say yes to etc.
At work tho we use Claude Code thru a proxy that uses the model hosted on AWS bedrock. It’s slower than consumer direct-to-Anthropic and you have to wait a bit for the latest models (Opus 4.5 took a while to get), but if our stats are to be believed it’s much much cheaper.
I don't know, all I can say is with API-based billing, doing multi-thousand like refactors that would take days to do costs like $4. In terms of value : effort, it's incredible.
In my example the Figma MCP takes ~300k per medium sized section of the page and it would be cool to enable it reading it and implementing Figma designs straight. Currently I have to split it which makes it annoying.
I mean the systems I work on have enough weird custom APIs and internal interfaces just getting them working seems to take a good chunk of the context. I've spent a long time trying to minimize every input document where I can, compact and terse references, and still keep hitting similar issues.
At this point I just think the "success" of many AI coding agents is extremely sector dependent.
Going forward I'd love to experiment with seeing if that's actually the problem, or just an easy explanation of failure. I'd like to play with more controls on context management than "slightly better models" - like being able to select/minimize/compact sections of context I feel would be relevant for the immediate task, to what "depth" of needed details, and those that aren't likely to be relevant so can be removed from consideration. Perhaps each chunk can be cached to save processing power. Who knows.
Or you just fully embrace the thin client life and offload everything to the server. pxe boot with remotely mounted filesystems. local hard drives? who needs those?
And the server is handled how? We're always there: complexity can be managed or hidden.
Why do you think some people asked SUN to un-free ZFS back in the day? Because unlike most, they understood its potential. Why do you think PC components today, graphics cards first, then RAM, and NVMe drives after that, cost so much? Because those who understand realize that today, a GNU/Linux homeserver and desktop are ready for the masses, and it's only a matter of time before a umbrel.com, start9.com, or even frigghome.ai succeeds and sweeps away an increasingly banning and therefore unreliable and expensive cloud providers. Most still haven't grasped this, but those who live above the masses have.
Why are snaps, flatpaks, docker etc are pushed so hard even though they have insane attack surfaces, minimal control over your own infrastructure, and are a huge waste of resources? Because they allow selling support to people who don't know. With NixOS or Guix, you only sell a text config. It's not the same business model, and after a while, with an LLM, people learn to do it themselves.
This is the first time that I feel a level of anxiety when I am not actively doing it. What a crazy shift that I am still so excited and enamored by the process after all of this time.
But there's also the double edged sword. I am also having a really hard time moderating my working hours, which I naturally struggle with anyway, even more. Partly because I am having so much fun and being so productive. But also because it's just so tempting to add 1 more feature, fix one more bug.