More

andrewmutz · 2026-01-15T19:49:16 1768506556

There are techniques to mitigate this. You can reuse containers instead of creating a new one each time. You can mount in directories (like ~/.claude) from your local machine so you dont have to set claude up each time.

raphinou · 2026-01-15T20:43:20 1768509800

I use agents in a container and persist their config like you suggest. After seeing some interest I shared my setup at https://github.com/asfaload/agents_container It works fine for me on Linux.

andrewmutz · 2026-01-13T15:54:52 1768319692

Completely agree. If you're motivated enough about a topic to post about it online, you're probably emotional about it and unable to see it in a clear-headed manner.

The people I know who have the most reasonable political opinions never post about it online. The people who have developed unhealthy and biased obsessions are the ones who post constantly.

qarl · 2026-01-13T20:36:25 1768336585

Heh... do you realize that your comment undermines itself?

BugsJustFindMe · 2026-01-13T17:35:44 1768325744

> If you're motivated enough about a topic to post about it online, you're probably emotional about it and unable to see it in a clear-headed manner.

> The people I know who have the most reasonable political opinions never post about it online.

And here you are posting your opinions online! How fascinating. I hope you recognize the extreme irony in the fact that you were motivated enough about this topic to post about it.

andrewmutz · 2026-01-09T14:53:22 1767970402

If I'm a linux user who uses firefox currently, what's the value prop for this browser? I already get privacy and extensions, is it just for testing my app on webkit?

tigroferoce · 2026-01-09T15:51:59 1767973919

The main benefit is that Orion (contrary to Firefox) has a business model. The downside is that it's not open source. They have some explanation on why, but it might be a deal breaker for someone.

fenwick67 · 2026-01-09T17:00:14 1767978014

Firefox has a business model, it is mostly "google search referrals"

thisislife2 · 2026-01-09T15:07:31 1767971251

You can already test a site on a webkit engine in Linux using Gnome Web (previously Epihany) or LuaKit ( https://luakit.github.io/ ). But it is always good to have options, even if commercial ones. From that aspect Orion on Linux is good news.

metabagel · 2026-01-09T16:13:50 1767975230

As I understand it, Orion was originally developed because Apple doesn't allow you to select Kagi as a search provider for Safari.

andrewmutz · 2026-01-07T22:50:24 1767826224

A friend of mine is in great shape and smokes cigarettes

kelipso · 2026-01-09T04:57:45 1767934665

Means he’s really doing something right…eating right and exercising. Well at least exercising.

andrewmutz · 2026-01-07T19:56:31 1767815791

Left-populists and right-populists like to frame issues as being a conflict between the elites and the common man. Banning big banks from owning homes is a perfect example of this.

It's fine to ban big banks from buying homes and wont do damage to the nation, but don't expect it to solve the problem.

High housing prices are due to zoning-based supply restrictions. These are entrenched due to politically active NIMBY voters.

Actually fixing the housing crisis means addressing zoning, but that doesn't fit the elite vs common man narrative so gets ignored by the populists.

tbrownaw · 2026-01-07T20:09:55 1767816595

> It's fine to ban big banks from buying homes and wont do damage to the nation

It makes things slightly worse for people who want a non-apartment house but think they might move soon.

andrewmutz · 2025-12-23T14:51:27 1766501487

I agree completely with the author that AI assisted coding pushes the bottleneck to verification of the code.

But you don't really need complete formal verification to get these benefits. TDD gets you a lot of them as well. Perhaps your verification is less certain, but it's much easier to get high automated test coverage than it is to get a formally verifiable codebase.

I think AI assisted coding is going to cause a resurgence of interest in XP (https://en.wikipedia.org/wiki/Extreme_programming) since AI is a great fit for two big parts of XP. AI makes it easy to write well-tested code. The "pairing" method of writing code is also a great model for interacting with an AI assistant (much better than the vibe-coding model).

9rx · 2025-12-23T15:39:23 1766504363

Trouble is that TDD, and formal proofs to much the same extent, assume a model of "double entry accounting". Meaning that you write both the test/proof and the implementation, and then make sure they agree. Like in accounting, the assumption is that the probability of you making the same mistake twice is fairly low, giving high confidence to accuracy when they agree. When there is a discrepancy, then you can then unpack if the problem is in the test/proof or the implementation. The fallible human can easily screw either.

But if you only fill out one side of the ledger, so to speak, an LLM will happily invent something that ensures that it is balanced, even where your side of the entry is completely wrong. So while this type of development is an improvement over blindly trusting an arbitrary prompt without any checks and balances, it doesn't really get us to truly verifying the code to the same degree we were able to achieve before. This remains an unsolved problem.

altmanaltman · 2025-12-23T15:51:03 1766505063

I don't fully understand what you mean by accounting expects the probability of making the same mistake twice is fairly low? Double-entry bookkeeping can only tell you if the books are balanced or not. We absolutely cannot assume that the books reflect reality just because they're balanced. You don't need to mess up twice to mess up the books in terms of truthness.

Also tests and code are independent while you always affect both sides in double-entry always. Audits exist for a reason.

layer8 · 2025-12-23T16:53:15 1766508795

With double-entry bookkeeping, the only way an error can slip through is if you make the same error on both sides, or else they wouldn’t be balanced. A similar thing is true for testing: If you make both an error in your test and in your implementation, they can cancel out and appear to be error-free.

I don’t quite agree with that reasoning, however, because a test that fails to test the property it should test for is a very different kind of error than having an error in the implementation of that property. You don’t have to make the “same” error on both sides for an error to remain unnoticed. Compared to bookkeeping, a single random error in either the tests or the implementation is more likely to remain unnoticed.

altmanaltman · 2025-12-23T18:14:52 1766513692

> With double-entry bookkeeping, the only way an error can slip through is if you make the same error on both sides, or else they wouldn’t be balanced. A similar thing is true for testing: If you make both an error in your test and in your implementation, they can cancel out and appear to be error-free.

Yeah but it's very different from tests vrs code though, right? Every entry has two sides at least and you do it together, they are not independent like test and code.

You can easily make a mistake if you write a wrong entry and it will still balance. Balanced books =/= accurate books is my point. And there is no difference between "code" and "tests" in double entry, it's all just "code".

So it seems like the person who made the metaphor doesn't really know how double-entry works or took maybe one accounting class.

zahlman · 2025-12-23T19:26:34 1766517994

> Yeah but it's very different from tests vrs code though, right? Every entry has two sides at least and you do it together, they are not independent like test and code.

The point of the current thread is that the use of AI coding agents threatens to disrupt that. For example, they could observe a true positive test failure and opt to modify the test to ensure a pass instead.

9rx · 2025-12-23T15:54:55 1766505295

What about the concept of "high confidence" is not understandable?

altmanaltman · 2025-12-23T18:18:05 1766513885

You can use a little less snark and "high confidence" is pretty easy to understand but your metaphor makes no sense. Balanced books =/= accurate books and it is not at all a sign that the bookkeeping is accurate. The entries are also not independent like code and tests.

9rx · 2025-12-23T18:25:44 1766514344

> Balanced books =/= accurate books

Naturally. Hence "high confidence" and not "full confidence". But let's not travel too far into the weeds here. Getting us back on track, what about the concept of "high confidence" is not understandable?

andrewmutz · 2025-12-23T15:52:25 1766505145

That sounds right in theory, but in practice my code is far, far higher quality when I do TDD than when I don't. This applies whether or not I'm using an Ai coding assistant

MetaWhirledPeas · 2025-12-23T16:04:38 1766505878

I don't think GP disagrees. They are (I think) stating that AI-assisted TDD is not as reliable as human TDD, because AI will invent a pointless test just to achieve a passing outcome.

senderista · 2025-12-23T22:49:51 1766530191

There's a pretty fundamental difference: TDD is about avoiding up-front design, while formal specification literally is up-front design.

antupis · 2025-12-23T20:20:00 1766521200

Mainstream has been slowly adapting xp like last 20 years, devops first and now pair programming with agents and tdd.

andrewmutz · 2025-12-22T16:37:25 1766421445

The issues raised in this article are why I think highly-opinionated frameworks will lead to higher developer productivity when using AI assisted coding

You may not like all the opinions of the framework, but the LLM knows them and you don’t need to write up any guidelines for it.

christophilus · 2025-12-22T19:34:19 1766432059

Yep. I ran an experiment this morning building the same app in Go, Rust, Bun, Ruby (Rails), Elixir (Phoenix), and C# (ASP whatever). Rails was a done deal almost right away. Bun took a lot of guidance, but I liked the result. The rest was a lot more work with so-so results — even Phoenix, surprisingly.

I liked the Rust solution a lot, but it had 200+ dependencies vs Bun’s 5 and Rails’ 20ish (iirc). Rust feels like it inherited the NPM “pull in a thousand dependencies per problem” philosophy, which is a real shame.

some-guy · 2025-12-23T15:55:13 1766505313

I can vouch for this as someone who works in a 1.6 million line codebase, where there are constant deviations and inconsistent patterns. LLMs have been almost completely useless on it other than for small functions or files.

andrewmutz · 2025-12-17T19:59:45 1766001585

Completely agree with this. I got to work closely with an IBM fellow one summer and I was impressed by his willingness to ask "dumb questions" in meetings. Sometimes he was just out of the loop but more often he was just questioning some of the assumptions that others in the room had left unquestioned.

boston_clone · 2025-12-17T20:23:56 1766003036

Unfortunately, I found that the culture of "think." at IBM is not matched at many other organizations. Most days, I miss it.

But forced RTO and only 10 days off per year is enough to keep me away ;)

andrewmutz · 2025-12-15T17:40:20 1765820420

These days the policy positions of each party are hashed out on social media by non-experts. For both the democrats and the republicans, instead of any sort of research or experts driving public policy decisions, it's instead the things that resonate with your average person's feelings as they scroll through their feed and get engagement.

The end result is of course populism. Each election cycle gets us closer to the policy positions of the Republicans being "Immigrants are bad" and Democrats being "Billionaires are bad".

We know where populism leads, and we've seen it for decades in south america. In a few decades, we will get to choose between the populist far left and the populist far right. Policy will get crazier and crazier and measurable societal outcomes will stagnate and perhaps go backwards.

This will continue as long as social media is the primary form of entertainment in the US.

TheAceOfHearts · 2025-12-15T17:59:56 1765821596

Maybe these researchers and experts should show up and present their suggested positions. People are tired of ivory tower proclamations, and most fundamentally, you need to reach people where they're at. That's just the kind of information ecosystem that we're living in, so people need to adapt.

Unfortunately, ignoring the public sphere and pretending that professionals are above such things is why we're now stuck with someone like Robert Kennedy Jr running HHS. This guy grew enough of a following and movement to reach a position of power and influence and he was barely challenged by experts all along the way.

andrewmutz · 2025-12-15T18:09:43 1765822183

Experts post on social media all the time, but their voices are not given any weight beyond that of people who aren't experts on the topic.

RFK jr running HHS is the wave of the future. Unfortunately, we will continue to have non-experts who generate high engagement content running policy decisions more and more in the future.

bigbadfeline · 2025-12-15T22:26:47 1765837607

> we will continue to have non-experts who generate high engagement content running policy decisions more and more in the future.

I don't see why you'd assume that only non-experts will generate high engagement content.

I don't disagree but such a sweeping assumption surely needs some argumentation and elucidation. Understating the mechanics of this quite unnatural state of affairs is vastly more valuable that the mere observation of its existence.

All I've seen to date are appeals to human nature but that's a highly misleading line of reasoning that creates more confusion about both human nature and the forces driving content creation.

M95D · 2025-12-17T09:23:18 1765963398

There are only 24h in a day and each one chooses how to invest that time.

Experts invest time in becoming experts in their field. Youtubers invest time in generating high engagement content and attracting more viewers. Can't have both.

sarky-litso · 2025-12-15T17:53:37 1765821217

The article is saying the exact opposite of this e.g. | three days ago, it came out that Maryland Governor Wes Moore, a 2028 candidate, | made sure to have lobbyists for the American Gas Association in the room when | he interviewed for open seats to the state Public Service Commission

RickJWagner · 2025-12-15T21:33:50 1765834430

Your point is valid.

Last election cycle, the incumbent president was pushed out of the race, largely initiated by an actor that lacks a college degree.

seba_dos1 · 2025-12-15T19:45:44 1765827944

So far you get to choose between moderate right and far right, so at least there's still a long way there.

bigbadfeline · 2025-12-15T22:35:01 1765838101

Not really - that would only be true if far right and far left were far apart in anything other than superficial rhetoric. The structure and operation of power are virtually the same for both.

palmotea · 2025-12-15T18:59:12 1765825152

> These days the policy positions of each party are hashed out on social media by non-experts. For both the democrats and the republicans, instead of any sort of research or experts driving public policy decisions, it's instead the things that resonate with your average person's feelings as they scroll through their feed and get engagement.

That would actually be a major improvement over what we have. Right now public policy decisions seem to get hashed out by nutjob activists on social media, not "average people."

Also the "research[ers and] experts" need to own up to their own responsibility for this situation. Right now we live in a populist moment because they got caught up in their own ideology and group-think, which created an opening for someone like Donald Trump. They should have seen the problems he used to build his support, and came up with effective solutions for them.

thrance · 2025-12-15T18:17:57 1765822677

What experts? You mean the overpaid consultants who dragged the democrats into pathetic ineffectiveness and made them lose against an obviously retarded manchild?

> The end result is of course populism. Each election cycle gets us closer to the policy positions of the Republicans being "Immigrants are bad" and Democrats being "Billionaires are bad".

Except immigrants have nothing to do with how bad things are going, while billionaires (and what they represent) are effectively the architects of this situation. "Billionaires are bad" is an oversimplified, but ultimately correct analysis of the issues of our time.

FDR basically saved the country from fascism with his "robber barons are bad" campaign. I deplore the fall into populism just as much as the next guy, but this is what the situation calls for. Social networks only play a minor part in all of this. Material conditions are degrading, and unrest will only grow until they start improving.

This country's governance has been subservient to capital, basically forever, and unchecked private power is now eating it from the inside. This is what must be fixed if this republic is to have any future, and the populist left is the only band of the political spectrum that at least acknowledges the issue.

vkou · 2025-12-15T22:44:10 1765838650

Ah, but that same populist left is inconvenient to the valuation of my RSUs, so we're going to have to put a cork in it, maybe we can revisit it once I have enough[1] money.

---

[1] I will never have enough money.

rpcope1 · 2025-12-15T17:50:39 1765821039

[flagged]

andrewmutz · 2025-12-15T17:51:54 1765821114

I agree with you that my words are unpopular. Populism is popular.

Government and economics is complicated, so it's not that crazy to suggest that your average person doesn't understand it very well. The medical analog of economic populism is antivax and free birth content. Super popular online, but leads to bad outcomes.

JohnBooty · 2025-12-15T18:19:58 1765822798

    Those damn plebs just have no idea what's best for them.

Most people are not an expert in a single field, much less multiple fields, and never every field.

So yes, we need experts to play a substantial role in running things.

Perhaps even more importantly: it's not solely about what's best for every individual. You know what would be best for me? If the government gave me a free giant SUV that gets 4mpg fuel economy, and also let me drive as fast as I wanted while also subsidizing 90% of my fuel costs. Also it should drive itself so I can sleep while driving.

Sometimes we need to consider what's best for society and the planet, too.

orwin · 2025-12-15T19:41:08 1765827668

France tried something clever pre-covid.

You can read about it here: https://en.wikipedia.org/wiki/Citizens_Convention_for_Climat...

Totally random people could draft new laws on climate (at least, they were told this). They met with lobbyists, both pro-oil and pro-climate for two weekends, experts on three other weekends, once in a conference-style where very generic stuff is said, two other in focus groups with more specific expertises, depending on the subject the focus group is on.

Experts were real experts though, with multiple publications and PhDs (or in some cases, engineering degrees, especially during the conference week), and tried to only talk on their subject matter.

In around 8 weekend, the 150 random people made ?148? law propositions, helped by lawyers, and most experts agree that they were both good and reasonable. What's interesting is that most of the 150 people said that before really learning about the subject, they would never have made this kind of propositions.

All that to say: experts don't have to run things, and imho, they should not. They should however have an advisory role to the random people drafting new laws.

andrewmutz · 2025-12-15T19:54:38 1765828478

I agree completely. I think the main difference is that it's important for your average people to become educated on topics by experts. Thats the part that is missing today.

andrewmutz · 2025-12-13T21:56:32 1765662992

Needing to upgrade a library everywhere isn’t necessarily a sign of inappropriate coupling.

For example, a library with a security vulnerability would need to be upgraded everywhere regardless of how well you’ve designed your system.

In that example the monolith is much easier to work with.

mjr00 · 2025-12-13T22:20:31 1765664431

While you're right, I can only think of twice in my career where there was a "code red all services must update now", which were log4shell and spectre/meltdown (which were a bit different anyway). I just don't think this comes up enough in practice to be worth optimizing for.

wowohwow · 2025-12-13T22:23:44 1765664624

You have not been in the field very long than I presume? There's multiple per year that require all hands on deck depending on your tech stack. Just look at the recent NPM supply chain attacks.

mjr00 · 2025-12-13T22:31:04 1765665064

You presume very incorrectly to say the least.

The npm supply chain attacks were only an issue if you don't use lock files. In fact they were a great example of why you shouldn't blindly upgrade to the latest packages when they are available.

wowohwow · 2025-12-13T22:36:09 1765665369

Fair enough, which is why I called out my assumption:).

I'm referring to the all hands on deck nature of responding to security issues not the best practice. For many, the NPM issue was an all hands on deck.

stavros · 2025-12-13T23:59:31 1765670371

Wait what? I've been wondering why people have been fussing over supply chain vulnerabilities, but I thought they mostly meant "we don't want to get unlucky and upgrade, merge the PR, test, and build the container before the malicious commit is pushed".

Who doesn't use lockfiles? Aren't they the default everywhere now? I really thought npm uses them by default.

Aeolun · 2025-12-14T00:43:23 1765673003

We use pretty much the entire nodejs ecosystem, and only the very latest Next.js vulnerability was an all hands on deck vulnerability. That’s taken over the past 7 years.

procaryote · 2025-12-14T08:04:12 1765699452

You solve a bunch of them by not using javacript in the backend though

threethirtytwo · 2025-12-14T16:38:33 1765730313

To add to this conversation from our other thread, you solve a bunch of problems that are nearly just as bad by not using microservices yet you still do. And that is the same reason why people use JavaScript despite the issues it introduces. It’s not like you’re the only person the industry who hasn’t used a technology that irrationally introduces horrible consequences.

zhivota · 2025-12-14T00:26:12 1765671972

I mean I just participated in a Next JS incident that required it this week.

It has been rare over the years but I suspect it's getting less rare as supply chain attacks become more sophisticated (hiding their attack more carefully than at present and waiting longer to spring it).

Aeolun · 2025-12-14T00:45:22 1765673122

NextJS was just bog standard “we designed an insecure API and now everyone can do RCE” though.

Everyone has been able to exploit that for ages. It only became a problem when it was discovered and publicised.

jameshart · 2025-12-13T23:08:06 1765667286

A library which patches a security vulnerability should do so by bumping a patch version, maintaining backward compatibility. Taking a patch update to a library should mean no changes to your code, just rerun your tests and redeploy.

If libraries bump minor or major versions, they are imposing work on all the consuming services to accept the version, make compatibility changes, test and deploy.

VirusNewbie · 2025-12-14T04:30:34 1765686634

This is pedantic, but no, it doesn't need to be updated everywhere. It should be updated as fast as possible, but there isn't a dependency chain there.

mettamage · 2025-12-13T22:11:46 1765663906

Example: log4j. That was an update fiasco everywhere.

smrtinsert · 2025-12-13T22:57:14 1765666634

1 line change and redeploy

jabroni_salad · 2025-12-13T23:13:59 1765667639

Works great if you are the product owner. We ended up having to fire and replace about a dozen 3rd party vendors over this.