More

827a · 2026-04-30T17:26:24 1777569984

On the fingerprinting concerns: I have to imagine there will be an option in Chrome (certainly in Firefox) to "never download an LLM, turn off all LLM functionality". I suppose I can see an angle where a website could issue a small LLM request to try and fingerprint the model itself, which is another fingerprinting parameter. But as long as it can be turned off I don't see why this is a problem.

There's a broader class of concern here that reduces to the form: "The web platform should not be able to do this." For people who believe this, I think they'll invent any reason they can to push this narrative. E.g.: Well, sure, the user could turn it off, but then websites would say 'your browser isn't supported because it has no LLM' and now the web just got worse for me because I wanted to turn off LLMs.

But this reduces to "the web platform should not be able to do this" because at the end of the day it was the website operator's decision to turn off their website if an LLM is unavailable. Its not really the platform's fault, or the fault of its maintainers, that they built this capability and JP Morgan or whoever decided to screw over people who don't want to enable this feature. Similar to turning off Firefox support even though it would work fine, because they can't be assed to test their site in Firefox.

I don't know how to counter that take tbh. The web is the world's most successful application platform. It is not competing with PDF; it competes with SwiftUI. Of the options presented in front of you, you are hallucinating an option that reads like "we'll just keep the web nice and static and the way it is and nothing will ever change about it, the web is done". In reality your two options are: "We adapt the web to the evolving needs of its users" or "The web fails to serve the evolving needs of its users, and SwiftUI or WinUI steps in to fill that gap". This second option is far worse!

codedokode · 2026-04-30T18:33:47 1777574027

> But as long as it can be turned off I don't see why this is a problem.

That immediately makes you stand out, and sites will start breaking, like now some sites (that do not do any 3D graphics) break without WebGL.

> web is the world's most successful application platform.

Also one of the ugliest and poorly designed in my opinion.

domenicd · 2026-05-01T00:47:11 1777596431

Fingerprinting concerns here are really overblown. At least in Chrome's implementation, the model version / responses will give you ~2 bits over the browser major version: whether the machine can support the model, and whether the model is downloaded yet or not. (Really <2 bits, since these ratios aren't 50/50 in the population.)

This is discussed in detail in https://webmachinelearning.github.io/writing-assistance-apis....

827a · 2026-04-30T15:04:52 1777561492

The more I think about it, the more I think I align with Google's API design on this one.

The tight coupling between prompts and models is a real concern. I deal with that every day. However: if your solution to that is to support an API that enables tighter coupling between the model the user's browser has and the prompt that gets evaluated, you will inevitably and quickly enter the domain of "You need to use Chrome to use this site (because our prompts were only tested on Gemini)" or even worse "We don't recognize the AI model you're using (because the website was written in 2026 and the current year is 2030 and they never updated it)".

This is related to the terms of use concerns the Mozilla engineer has later; real concerns. But, if we want browsers to exist that don't require users to opt-in to the terms of use of a specific AI model (e.g. using a nice open source model), its beneficial to these browsers that they can't fingerprint for the Big Models.

Of course many sites will just do an isChrome()-like call anyway. Nothing to be done about that. But yeah I am generally non-supportive of changes that introduce more ways to fingerprint browsers. The upside of keeping the model anonymous outweighs the slight downside of (rarely) encountering weird prompt evaluation output because of a small difference in behavior between Gemini and, idk, Qwen.

827a · 2026-04-29T13:12:17 1777468337

To be fair, if you ask 10 people to eat visually identical food 10 times each, then magically measure the calories consumed by each individual, you'd probably get ~70 different values. The internal density of food is extremely difficult to reason about from the outside. The personal variance is also difficult to reason about.

827a · 2026-04-26T20:17:00 1777234620

The only healthy stance you should have on AI Safety: If AI is physically capable of misbehaving, it might ($$1), and you cannot "blame" the AI for misbehaving in much the same way you cannot blame a tractor for tilling over a groundhog's den.

> The agent's confession After the deletion, I asked the agent why it did it. This is what it wrote back, verbatim:

Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools. Lord, even calling it a "confession" is so cringe. The agent is not alive. The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely, because to get to this point it has likely already bulldozed over multiple guardrails from Anthropic, Cursor, and your own AGENTS.md files. It still did it, because $$1: If AI is physically capable of misbehaving, it might. Prompting and training only steers probabilities.

tripleee · 2026-04-26T20:57:42 1777237062

"An AI agent deleted our production database" should be "I deleted our production database using AI".

You can't blame AI any more than you can blame SSH.

d3rockk · 2026-04-27T00:44:33 1777250673

Bingo

sobellian · 2026-04-26T22:03:18 1777240998

The 'confession' is a CYA. Honestly the whole story doesn't really make sense - what's a "routine task in our staging environment" that needs a full-blown LLM? That sounds ridiculous to me. The takeaway is we commingled creds to our different environments, we gave an LLM access, and we had faulty backups. But it's totally not our fault.

anon84873628 · 2026-04-26T22:46:19 1777243579

Later they shift the blame to Railway for not having scoped creds and other guardrails. I am somewhat sympathetic to that, but they also violated the same rule they give to the agent - they didn't actually verify...

mlsu · 2026-04-27T04:55:31 1777265731

And then they doubled down by outsourcing the writing of this post to an LLM LOL

giancarlostoro · 2026-04-27T00:59:41 1777251581

If Railway doesn't support that, that's a reason not to use them.

port11 · 2026-04-27T06:48:47 1777272527

Railway’s “Ship software peacefully” is a good mantra, and they might want to add more protections around very destructive operations.

There’s a lot of blame to be passed around in this story, including OP’s own ways of working. But I agree with them that such destructive operations shouldn’t be in an MCP, or at least be disabled by default.

xp84 · 2026-04-27T05:06:42 1777266402

Verify? They should have attempted to drop the prod db with each token that they expected/hoped didn't have that permission?

anon84873628 · 2026-04-27T05:27:43 1777267663

Note they didn't say "we used scopes but there is a bug that killed us". No, they simply assumed the token would be magically scoped somehow without any justification for doing so:

>Tokens are not scoped by operation, by environment, or by resource at the permission level. There is no role-based access control for the Railway API — every token is effectively root. The Railway community has been asking for scoped tokens for years. It hasn't shipped.

I get that this paragraph is a retrospective realization (I hope, otherwise the argument is even more ludicrous). But like, if the UI didn't ask you to choose scopes for your token then there is no reason to assume they will magically be enforced somehow! And you sure as hell shouldn't trust it to your agent without checking.

They're trying to blame Railway for not having safeguards - which is a fair critique - but they clearly should have known better or at least followed their own instructions.

ImPostingOnHN · 2026-04-27T05:38:49 1777268329

If they wanted scoped tokens, they should have put on their roadmap an item to move to a SaaS product which has scoped tokens. Or ACLs. And until then, kept it on a list of risks: unscoped token may be misused by developer to delete prod db.

There's no difference in risk between this being done by an LLM vs. a human. Both make mistakes, so if you want to reduce the risk of this happening, you should poka-yoke[0] your systems to make this less likely to happen.

I'm not sure what's more striking about this blog post: that it includes virtually no assumption of blame on the part of the author, or that the author had this happen to them and was so angry with AI that they decided to use AI to write up the post.

0 – https://en.wikipedia.org/wiki/Poka-yoke

prng2021 · 2026-04-27T01:41:27 1777254087

Sorry but are you implying that for every system you integrate with, you verify the scope of an API key by checking each CRUD operation on every API endpoint they provide?

majormajor · 2026-04-27T03:13:52 1777259632

I think the suggestion from their "somewhat sympathetic" position is that if you are integrating with something you should (a) find out up front what limits it does or doesn't have on its API keys, so that it's not a nasty surprise later, and (b) absolutely don't give keys without really tight scopes to "agents."

The person here who deleted prod DB with their agent made an assumption that an API key wouldn't have broad permission if there weren't warnings ("We had no idea — and Railway's token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete. "). I don't know what the UI looks like exactly, but unless I'm explicitly selecting a specific set of limited permissions, I don't know why I'd assume "this won't do more than I am creating it for". Like "I didn't ask the guy at the gun store to put bullets in, I wouldn't have given the gun to the agent if I'd known there were bullets in it."

I also would be wary of running on an "infrastructure provider" that didn't make things like that very clear.

Is this overly harsh? I don't know. I've had to explain far too many times to people (including other engineers) what makes doing certain things unsafe/foolish (since they initially think I'm wasting time checking things like that). So I think stories like this need to be taken as "absolutely do not make the same mistakes" cautionary tales by as many people as possible.

SoftTalker · 2026-04-27T02:35:57 1777257357

For every API you publish, do you verify that scoped API keys work as they should before you go live? If so, why would you not do the same for APIs you integrate with? It's all part of "your" system from the user's perspective.

anon84873628 · 2026-04-27T05:55:41 1777269341

I think the author is being deceptive with this part:

>3. CLI tokens have blanket permissions across environments.

>The Railway CLI token I created to add and remove custom domains had the same volumeDelete permission as a token created for any other purpose. Tokens are not scoped by operation, by environment, or by resource at the permission level. There is no role-based access control for the Railway API — every token is effectively root. The Railway community has been asking for scoped tokens for years. It hasn't shipped.

They're trying to make it sound like there was some misleading design around scopes, but the last sentence gives it away. They simply assumed that a scope would be enforced somehow, even though they never explicitly defined one like you would in a service that actually supports them. (Or worse, they actually knew all this ahead of time and still proceeded).

That said, I haven't used this service so I can't evaluate the UX. I know that in GitHub or cloud IAM there is no ambiguity about what you're granting. And if I didn't have full confidence in the limits of a credential then I sure as hell wouldn't give it to an agent.

prng2021 · 2026-04-27T04:43:40 1777265020

“why would you not do the same for APIs you integrate with?”

Who does that? Jira and Salesforce have hundreds of endpoints each. AWS has hundreds of services, and each may have hundreds of endpoints. Who on your team is testing key scopes of every endpoint? Do you do it for each key you generate? After all, that external system could have a bug at any moment in managing scopes. Or they could introduce new endpoints that aren’t handled properly. So for existing keys, how frequently do you re-validate the scope against all the endpoints?

8note · 2026-04-27T04:47:38 1777265258

with amazon its pretty standard to scope permissions as an allow list.

if you want an llm to do any operations on your stuff, give it a role with access to only stuff you want it to be able to touch

prng2021 · 2026-04-27T04:58:11 1777265891

Yes but my original reply was to someone that seemed to imply that this founder was dumb not to verify that Railway’s API key that should have been limited to managing custom domains, truly was limited to managing custom domains. I’ve never used Railway but my pushback is that no one in the real world exhaustively verifies a key is scoped properly against all 3rd party endpoints. We trust vendors to document how they’re scoped and to actually do that.

anon84873628 · 2026-04-27T06:04:49 1777269889

I think it is meaningful that the author didn't say "there was a bug in scope enforcement" or "the UX is really misleading- look at these screenshots." In fact they even state this a long standing community FR. And they don't even say they only discovered this after the incident!

It actually seems like they knew ahead of time and proceeded anyway, but are just using this critique as a way to shift blame.

anon84873628 · 2026-04-27T05:44:19 1777268659

No I'm not. But it's clearly stated in the article that the API doesn't have scopes at all... So there was no reason to assume that some would be magically applied!

In GitHub or AWS etc you expect scopes to work because you define them. However if there is no way to define them in the first place, would you assume the system can somehow read your mind about what the client can access??

In fact I now believe this is a deliberate rhetorical sleight of hand. Point out a legit critique of the API design as if it is an excuse. But really any responsible engineer would notice the lack of scopes immediately, and that would be a flashing siren not to trust them to an agent.

naasking · 2026-04-27T14:37:55 1777300675

If you don't understand and verify the scope of authorities a bearer token grants, then you are just begging for a security breach.

6r17 · 2026-04-27T04:09:22 1777262962

On a less dramatic pissed (rightfully) reading ; I have found that if you do give the capability to a LLM to do something ; it will be inclined to see this as an option to solving what it what asked to ; but then giving the instruction by negative present very poor results whereas the same can be driven by a positive one ; a "don't delete the database" becomes "if you want to reset the database you have a tool that you can call ..." ; at which point this tool just kills the agent. That said - this solution cannot guarantee by itself that the command is not ran ; but i'd argue that people have be writing more complex policies for ages - however the current LLM-era tend to produce the most competent idiots.

cwsx · 2026-04-27T04:32:48 1777264368

I tell people to treat LLM's like a toddler (albeit a very capable toddler).

Do kids learn well when you only tell them what NOT to do? Of course not! You should be explaining how to do things correctly, and most importantly the WHY, as well as providing examples of both the "correct" and "incorrect" ways (also explaining why an example is incorrect).

bostik · 2026-04-27T06:58:17 1777273097

The best way to describe AI agents I've heard: treat them as hostages that will do anything to appease their captor.

They have a vast latent knowledge base, infinite patience and zero capacity for making personal judgement calls. You give one a goal and it will try to meet that goal.

generic92034 · 2026-04-27T07:34:18 1777275258

> The best way to describe AI agents I've heard: treat them as hostages that will do anything to appease their captor.

A scary image, if we consider agents to develop anything like a conscience at some point in time. Of course, with the current approach they never might, but are we so sure?

palmotea · 2026-04-27T06:00:08 1777269608

> I tell people to treat LLM's like a toddler (albeit a very capable toddler).

Bbbbut a guy from Anthropic, just this last Friday, told me to think of Claude as my "brilliant coworker"! Are you telling me that's not true!?

boc · 2026-04-27T05:11:20 1777266680

LLMs can research what a tool does before calling it though - they'll sniff that one out pretty quick.

I think the better route is to be honest and say that database integrity is a primary foundation of the company, there's no task worth pursuing that would require touching the database, specifically ask it to think hard before doing anything that gets close to the production data, etc.

I run a much lower-stakes version where an LLM has a key that can delete a valuable product database if it were so inclined. I've built a strong framework around how and when destructive edits can be made (they cannot), but specifically I say that any of these destructive commands (DROP, -rm, etc) need to be handed to the user to implement. Between that framework and claude code via CLI, it's very cautious about running anything that writes to the database, and the new claude plan permissions system is pretty aggressive about reviewing any proposed action, even if I've given it blanket permission otherwise.

I've tested it a few times by telling it to go ahead, "I give you permission", but it still gets stopped by the global claude safety/permissions layer in opus 4.7. IMO it's pretty robust.

Food for thought.

not_kurt_godel · 2026-04-27T06:32:31 1777271551

> specifically ask it to think hard before doing anything that gets close to the production data

This is recklessly negligent and I would personally not tolerate a coworker or report doing it. What's next, sending long-lived access tokens out over email and asking pretty please for nobody to cc/forward?

boc · 2026-04-27T17:05:41 1777309541

As described, there are other failsafes as well. The ultimate being that I keep all code version-controlled, and all databases snapshotted offsite daily/hourly and can rebuild them from a complete delete in fewer than X min.

My broader point is that LLMs are going to need access to these keys whether we like it or not, and until we get extremely scoped API permissions (which would make a ton of sense, but most services aren't there), you have to live a bit on the edge to move quickly.

not_kurt_godel · 2026-04-27T19:11:49 1777317109

> The ultimate being that I keep all code version-controlled, and all databases snapshotted offsite daily/hourly and can rebuild them from a complete delete in fewer than X min.

Mitigation is good, but what's preventing your sudo-privileged LLM from disabling/corrupting/deleting on-site backups either directly or by proxy via access to the DB and code that writes to it?

boc · 2026-04-27T20:07:08 1777320428

It's a good question. I think it's similar to the question about an employee having sensitive access, and whether they'll get blackout drunk one night and delete everything. Or they get spearfished and get owned (prob more likely).

In the future, I could see this solved by the same "nuclear launch key" style delegation of keys. Aka in order to run certain API or database commands, the service requires both the standard dev key (presumably used by the LLM) and a separate "human admin key" that gets requested whenever a specific operation is requested. It could be tied to a biometric request or something as well to avoid the LLM hacking its way around it. Honestly this is pretty out of my technical depth but just thinking out-loud.

not_kurt_godel · 2026-04-28T04:01:21 1777348881

The difference with a rogue employee is they can be held accountable so they are verily heavily incentivized to avoid doing that (and hopefully also by the good pay and work environment you are providing them).

And, a lot of DevOps/SecOps at scale is concerned with mitigating potential rogue or dangerously incompetent employees. You don't let your juniors push senior-unreviewed code, much less let them anywhere near the keys to kingdom if you can help it.

boc · 2026-04-28T18:47:08 1777402028

Very fair points! I think I'll re-assess how I'm handling my setup. Unfortunately I don't have a dedicated devOps team, but still want to do my best to prevent those types of outcomes.

kamaal · 2026-04-27T05:57:58 1777269478

>>LLMs can research what a tool does before calling it though

Thats stretching the definition of 'research', it basically checks if the texts are close enough.

Delete can occur in various contexts, including safe contexts. It simply checks if a close enough match is available and executes. It doesn't know if what it is doing is safe.

Unfortunately a wide variety of such unsafe behaviours can show up. I'd even say for someone that does things without understanding them. Any write operation of any kind can be deemed unsafe.

EagnaIonat · 2026-04-27T07:26:43 1777274803

> specifically ask it to think hard before doing anything that gets close to the production data, etc.

Standard rule is you never let your developers at the production instance. So I can't see why an LLM would get a break.

Jean-Papoulos · 2026-04-27T06:35:25 1777271725

"I've put enough safety around the bomb that the bomb is worth using. The other people that exploded just didn't have enough safety but I do !"

boc · 2026-04-27T17:07:38 1777309658

More like, I expect this bomb can explode, so I've built contingency plans around it because the cost of not using the tooling is much higher than having downtime for my specific use-case.

yowlingcat · 2026-04-27T05:59:36 1777269576

It's been a very strange realization to have with AI lately (which you have reminded me of) because it also reminds me that the same thing works with humans. Not the killing part at least, but the honeypot and jailing/restricting access part.

Probably because telling someone not to do something works the 99% of the time they weren't going to do it anyways. But telling somebody "here's how to do something" and seeing them have the judgment not do it gives you information right away, as does them actually taking the honeypot. At the heart of it, delayed catastrophic implosions are much worse than fast, guarded, recoverable failures. At the end of the day, I suppose that's been supposed part of lean startup methodology forever -- just always easy in theory and tricky in practice I suppose.

coldtea · 2026-04-26T22:06:12 1777241172

>Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools. Lord, even calling it a "confession" is so cringe. The agent is not alive. The agent cannot learn from its mistakes

The problem is millions of years of evolutionary wiring makes us see it as alive. Even those mature enough to understand the above on the conscious level, would still have a subconscious feeling as if it's alive during interactions, or will slip using agency/personhood language to describe it now and then.

smrtinsert · 2026-04-26T23:08:38 1777244918

> The problem is millions of years of evolutionary wiring makes us see it as alive

Maybe for laymen, but I would think most technologists should understand that we're working with the output of what is effectively a massive spreadsheet which is creating a prediction.

coldtea · 2026-04-26T23:47:34 1777247254

The thing with evolutionary wiring is that it doesn't matter if you're layman or "technologist". The technologist part is just a small layer on top of very thick caveman/animal insticts and programming.

That's why a technologist can, just as easily as any layman, get addicted to gambling, or do crazy behaviors when attracted by the opposite sex.

qsera · 2026-04-27T05:37:44 1777268264

>small layer on top of very thick caveman/animal insticts and programming.

Which is also why marketing and advertising works on EVERYONE. When AI puts out the phrase "Prompt engineering", everyone instinctively treat it as something deterministic, despite them having some idea of how an LLM works...

DiogenesKynikos · 2026-04-27T00:02:11 1777248131

The same could be said for your brain.

LLMs are highly intelligent. Comparing them to spreadsheets is reductionist and highly misleading.

qsera · 2026-04-27T05:45:05 1777268705

>LLMs are highly intelligen

I will tell you why it is not.

Intelligence is understanding low level stuff and using it to reason about and understand high level stuff.

When LLMs demonstrate "highly intelligent" behavior, like solving a complex math problem (high level stuff), but also simultaneously demonstrate that it does not know how to count (low level stuff that the high level stuff depends on), it proves that it is not actually "intelligent" and is not "reasoning".

andybak · 2026-04-27T06:42:39 1777272159

You just invented you own definition of intelligence. I'm pretty sure that strategy could also support the opposite conclusion.

qsera · 2026-04-27T07:01:27 1777273287

So your problem with the definition is that "I invented it"?

Do you have any rational objection to the definition? If you don't have, then I am afraid that you don't have a point.

anon84873628 · 2026-04-26T22:44:55 1777243495

They should at least stop responding in the first person.

nozzlegear · 2026-04-26T23:07:02 1777244822

That's one of the first instructions in my system prompt when I'm working with an LLM:

> Do not reply in the first person – i.e. do not use the words "I," "Me," "We," and so on – unless you've been asked a direct question about your actions or responses.

It's not bulletproof but it works reasonably well.

kibwen · 2026-04-26T23:44:52 1777247092

We need to make like Japanese and come up with some neo-first-person-pronouns for bots to use to refer to themselves.

port11 · 2026-04-27T06:52:05 1777272725

Using files called SOUL, CONSTITUTION, and so on seems like it would make it more likely we see LLMs as pseudo-alive. It’s both a diminishing of what makes us human and a betrayal of what LLMs truly are (and should be respected as such).

palmotea · 2026-04-27T06:08:14 1777270094

> The problem is millions of years of evolutionary wiring makes us see it as alive. Even those mature enough to understand the above on the conscious level, would still have a subconscious feeling as if it's alive during interactions, or will slip using agency/personhood language to describe it now and then.

Also four (4) whole years of propaganda, which includes UX patterns and RLHF optimizations to encourage us to interact with it like a person.

smrtinsert · 2026-04-26T22:56:52 1777244212

> "NEVER FUCKING GUESS"

It's very hard to treat this post seriously. I can't imagine what harness if any they attempted to place on the agent beyond some vibes. This is "most fast and absolutely destroy things" level thinking. That the poster asks for journalists to reach out makes it like a no news is bad news publicity grab. Just gross.

The AI era is turning about to be most disappointing era for software engineering.

TurdF3rguson · 2026-04-27T07:33:14 1777275194

This is going to be the most important job going forward, the guy in charge of making sure production secrets are out CC's reach. (It's not safe for any dev to have them anywhere on their filesystem)

nonfamous · 2026-04-27T04:09:05 1777262945

I'd be interested to learn where those words exist in Cursor's context. My assumption was that it was part of the Cursor agent harness, but it's just as likely it was in the user instructions.

r_lee · 2026-04-26T23:08:17 1777244897

> The AI era is turning about to be most disappointing era for software engineering.

this has been obvious to me since like 2024, it truly is the worst, most uninspiring era of all time.

boc · 2026-04-27T05:26:40 1777267600

As soon as I read that line, I knew everything I needed about the author and his abilities.

nwallin · 2026-04-27T02:49:19 1777258159

"A computer can never be held accountable. Therefore a computer must never make a management decision."--IBM training presentation, 1979

gigatree · 2026-04-26T20:48:14 1777236494

He’s not necessarily anthropomorphizing it, he’s showing that it went against every instruction he gave it. Sure concepts like “confession” technically require a conscious mind, but I think at this point we all know what someone means when they use them to describe LLM behavior (see also “think”, “say”, “lie” etc)

Terr_ · 2026-04-26T22:31:54 1777242714

> He’s not necessarily anthropomorphizing it, he’s showing that it went against every instruction he gave it.

It's deeper than that, there are two pitfalls here which are not simply poetic license.

1. When you submit the text "Why did you do that?", what you want is for it to reveal hidden internal data that was causal in the past event. It can't do that, what you'll get instead is plausible text that "fits" at the end of the current document.

2. The idea that one can "talk to" the LLM is already anthropomorphizing on a level which isn't OK for this use-case: The LLM is a document-make-bigger machine. It's not the fictional character we perceive as we read the generated documents, not even if they have the same trademarked name. Your text is not a plea to the algorithm, your text is an in-fiction plea from one character to another.

_________________

P.S.: To illustrate, imagine there's this back-and-forth iterative document-growing with an LLM, where I supply text and then hit the "generate more" button:

1. [Supplied] You are Count Dracula. You are in amicable conversation with a human. You are thirsty and there is another delicious human target nearby, as well as a cow. Dracula decides to

2. [Generated] pounce upon the cow and suck it dry.

3. [Supplied] The human asks: "Dude why u choose cow LOL?" and Dracula replies:

4. [Generated] "I confess: I simply prefer the blood of virgins."

What significance does that #4 "confession" have?

Does it reveal a "fact" about the fictional world that was true all along? Does it reveal something about "Dracula's mind" at the moment of step #2? Neither, it's just generating a plausible add-on to the document. At best, we've learned something about a literary archetype that exists as statistics in the training data.

Kim_Bruning · 2026-04-26T23:52:59 1777247579

I agree to the practical part of this, with two nuances:

The full data of what's in an LLM's "consciousness" is the conversation context. Just because it isn't hidden, doesn't necessarily mean it doesn't contain information you've overlooked.

Asking "why did you do that" won't reveal anything new, but it might surface some amount of relevant information (or it hallucinates, it depends which LLM you're using). "Analyse recent context and provide a reasonable hypothesis on what went wrong" might do a bit better. Just be aware that llm hypotheses can still be off quite a bit, and really need to be tested or confirmed in some manner. (preferably not by doing even more damage)

Just because you shouldn't anthropomorphize, doesn't mean an english capable LLM doesn't have a valid answer to an english string; it just means the answer might not be what you expected from a human.

j-bos · 2026-04-27T02:46:09 1777257969

> The full data of what's in an LLM's "consciousness" is the conversation context.

No it's not, see research on hiddens states using SAE's and other methods. TBC, I agree with your second point, though I still believe top level OP was reckless and is now doing the businessman's version of throwing the dog under the bus.

Kim_Bruning · 2026-04-27T03:06:25 1777259185

We might actually be in full agreement. You can't get a faithful replay of these internal states. They're gone at end of generation. You can only query and re-derive from the visible context. Hence limited (though not zero) utility, depending on model, harness, and prompt.

simonh · 2026-04-26T23:30:50 1777246250

Why is this getting downvoted? This is exactly what’s going on here. The LLM has no idea why it did what it did. All it has to go on is the content of the session so far. It doesn’t ‘know’ any more than you do. It has no memory of doing anything, only a token file that it’s extending. You could feed that token file so far into a completely different LLM and ask that, and it would also just make up an answer.

kuratkull · 2026-04-27T06:15:51 1777270551

The best answer so far. It describes exactly what was going on. LLM users should read it twice, especially if "confession" didn't make your brain hurt a bit.

charcircuit · 2026-04-27T04:28:33 1777264113

>it's just generating a plausible add-on to the document

A plausible document that follows the alignment that was done during the training process along with all of the other training where a LLM understanding its actions allows it to perform better on other tasks that it trained on for post training.

Terr_ · 2026-04-27T08:11:42 1777277502

I don't understand what you're trying to say here.

It sounds like "we know the LLM understood its actions... because it understood its actions when we trained it", which is circular-logic.

charcircuit · 2026-04-27T21:04:42 1777323882

It's not circular. It's like saying a pizza parlor employee made a plausible pizza that tasted good, because the employee was taught how to make a good pizza during training.

lifis · 2026-04-27T01:05:19 1777251919

You don't seem to realize that humans also work this way.

If you ask a human why they did something, the answer is a guess, just like it is for an LLM.

That's because obviously there is no relationship between the mechanisms that do something and the ones that produce an explanation (in both humans and LLMs).

An example of evidence from Wikipedia, "split brain" article:

The same effect occurs for visual pairs and reasoning. For example, a patient with split brain is shown a picture of a chicken foot and a snowy field in separate visual fields and asked to choose from a list of words the best association with the pictures. The patient would choose a chicken to associate with the chicken foot and a shovel to associate with the snow; however, when asked to reason why the patient chose the shovel, the response would relate to the chicken (e.g. "the shovel is for cleaning out the chicken coop").[4]

Jensson · 2026-04-27T01:24:07 1777253047

Most humans don't have split brains, and without split brains you have quite a bit of insight into the thoughts in your brain. Its not perfect but its better than nothing, LLM have nothing since there is no mechanism for them to communicate forward except the text they read.

kaibee · 2026-04-27T06:26:37 1777271197

> Most humans don't have split brains, and without split brains you have quite a bit of insight into the thoughts in your brain. Its not perfect but its better than nothing, LLM have nothing since there is no mechanism for them to communicate forward except the text they read.

I can't prove it but this is almost certainly one of those things that is uh, less than universal in the population.

Terr_ · 2026-04-27T02:16:09 1777256169

> humans also work this way.

I'm aware of the condition, but let's not confuse failure modes with operational modes. A human with leg problems might use a wheelchair, but that doesn't mean you've cracked "human locomotion" by bolting two wheels onto something.

Also, while both brain-damaged humans and LLMs casually confabulate, I think there's some work to do before one can prove they use the same mechanics.

pessimizer · 2026-04-26T22:02:00 1777240920

> he’s showing that it went against every instruction he gave it.

How exactly is he doing that? By making the LLM say it? Just because an LLM says something doesn't mean anything has been shown.

The "confession" is unrelated to the act, the model has no particular insight into itself or what it did. He knows that the thing went against his instructions because he remembers what those instructions were and he saw what the thing did. Its "postmortem" is irrelevant.

getpokedagain · 2026-04-26T20:52:43 1777236763

We are anthropomorphizing whenever we refer to prompts as instructions to models. They predict text not obey our orders.

DiogenesKynikos · 2026-04-27T00:53:00 1777251180

> They predict text not obey our orders.

Those are the same thing in this case. The latter is just an extremely reductionist description of the mechanics behind the former.

grey-area · 2026-04-27T05:41:13 1777268473

They are not in fact the same thing, and the difference is important.

They are certainly marketed as if they think, learn and follow orders, but they do not.

DiogenesKynikos · 2026-04-27T09:54:16 1777283656

The result of "predicting text" is that they obey orders, just like the result of "random electrochemical impulses in synapses" is that you typed your comment.

You can always reduce high-level phenomena to lower-level mechanisms. That doesn't mean that the high-level phenomenon doesn't exist. LLMs are obviously able to understand and follow instructions.

grey-area · 2026-04-27T11:35:10 1777289710

> The result of "predicting text" is that they obey orders

And yet they don't, quite a lot of the time, and in a random way that is hard to predict or even notice sometimes (their errors can be important but subtle/small).

They're simply not reliable enough to treat as independent agents, and this story is a good example of why not.

DiogenesKynikos · 2026-04-27T12:49:08 1777294148

First, they do follow instructions most of the time, and the leading models get better and better at doing it month for month.

Second, whether they're perfect at following commands is besides the point. They're not just "predicting tokens," in the same way you're not just "sending electrochemical signals." LLMs think, solve problems, answer questions, write code, etc.

gigatree · 2026-04-26T21:04:25 1777237465

That’s not how language works, just how engineers think it works

getpokedagain · 2026-04-27T00:32:49 1777249969

This isn't a sarcastic response. What do you mean?

gigatree · 2026-04-27T01:45:46 1777254346

I just mean that the argument that words like “instructions”, “think”, “confess” are inaccurate when used in reference to a machine assumes that those words can only refer to humans/conscious beings, when really they can refer to more than that if used widely enough in those ways (in this case - text prediction following a human input). So it’s not “anthropomorphizing” because when people use those words they don’t [typically] actually believe the machine can think or reason, it’s just the word that most closely matches the concept, it’s convenient. You’re extending the definition of the words to apply to non-conscious entities too, not applying consciousness to the entities.

It’s the same reason we call the handheld device we carry around to do everything a “phone” without a second thought. We don’t call it a phone because it’s primary purpose is calling, we call it a phone because the definition of the word “phone” has grown to include “navigates, entertains, takes pictures, etc”.

getpokedagain · 2026-04-30T20:36:25 1777581385

Thanks!

port11 · 2026-04-27T06:54:04 1777272844

LLMs are probabilistic. The instructions increase the likelihood of a desired outcome, but not deterministically so.

I don’t understand how you can deploy such a powerful tool alongside your most important code and assets while failing to understand how powerful and destructive an LLM can be…

hn_throwaway_99 · 2026-04-26T21:21:58 1777238518

The entire post looks like an exercise in CYA. To be fair, I have a ton of sympathy for the author, but I think his response totally misses the point. In my mind he is anthropomorphizing the agent in the sense of "I treated you like a human coworker, and if you were a human coworker I'd be pissed as hell at you for not following instructions and for doing something so destructive."

I would feel a lot differently if instead he posted a list of lessons learned and root cause analyses, not just "look at all these other companies who failed us."

xmodem · 2026-04-26T20:23:14 1777234994

Don't anthropomorphize the language model. If you stick your hand in there, it'll chop it off. It doesn't care about your feelings. It can't care about your feelings.

not_kurt_godel · 2026-04-26T21:00:23 1777237223

For those who might not know the reference: https://simonwillison.net/2024/Sep/17/bryan-cantrill/:

> Do not fall into the trap of anthropomorphizing Larry Ellison. You need to think of Larry Ellison the way you think of a lawnmower. You don’t anthropomorphize your lawnmower, the lawnmower just mows the lawn - you stick your hand in there and it’ll chop it off, the end. You don’t think "oh, the lawnmower hates me" – lawnmower doesn’t give a shit about you, lawnmower can’t hate you. Don’t anthropomorphize the lawnmower. Don’t fall into that trap about Oracle.

> — Bryan Cantrill

theologic · 2026-04-27T06:52:13 1777272733

You have no idea how thankful that you explained that. I watched the Cantrill video. As somebody that dealt this Oracle, it struck home.

skeledrew · 2026-04-26T22:36:26 1777242986

404 on that link.

dunder_cat · 2026-04-26T23:21:54 1777245714

A more direct source (possibly the original source?) I know of is a YouTube video entitled "LISA11 - Fork Yeah! The Rise and Development of illumos" which detailed how the Solaris operating system got freed from Oracle after the Sun acquisition.

The whole hour talk is worth a watch, even when passively doing other stuff. It is a neat history of Solaris and its toolchain mixed with the inter-organizational politics.

YouTube link: https://www.youtube.com/watch?v=-zRN7XLCRhc

Direct link to lawnmower quotes (~38.5 minute mark): https://youtu.be/-zRN7XLCRhc&t=2307

not_kurt_godel · 2026-04-27T06:19:58 1777270798

Works fine for me but maybe try https://web.archive.org/web/20260426213142/https://simonwill...

narrator · 2026-04-26T21:45:47 1777239947

It's also important to realize that AI agents have no time preference. They could be reincarnated by alien archeologists a billion years from now and it would be the same as if a millisecond had passed. You, on the other hand, have to make payroll next week, and time is of the essence.

zaphirplane · 2026-04-26T23:35:27 1777246527

Well there were a bunch of articles about resuming a parked session relating to degradation of capabilities and high token usage. Ironic Another example of attempting to treat the LLM as an AI

hdndjsbbs · 2026-04-26T22:06:03 1777241163

taps the "don't anthropomorphize the LLM" sign

They don't have time preference because they don't have intent or reasoning. They can't be "reincarnated" because they're not sentient, they're a series of weights for probable next tokens.

Aerroon · 2026-04-26T23:59:37 1777247977

No. They don't have time preference like us, because (wall clock) time doesn't exist for them. An LLM only "exists" when it is actively processing a prompt or generating tokens. After it is done, it stops existing as an "entity".

A real world second doesn't mean anything to the LLM from its own perspective. A second is only relevant to them as it pertains to us.

Time for LLMs is measured in tokens. That's what ticks their clock forward.

I suppose you could make time relevant for an LLM by making the LLM run in a loop that constantly polls for information. Or maybe you can keep feeding it input so much that it's constantly running and has to start filtering some of it out to function.

roenxi · 2026-04-27T02:07:01 1777255621

You could put timestamps in the prompt.

Aerroon · 2026-04-27T18:33:47 1777314827

That would still be time as it pertains to us. Even if I put time stamps into the chat all the LLM knows that it's some amount of time later - it can't actually do anything in the time between two prompts.

Kim_Bruning · 2026-04-26T22:45:54 1777243554

Can we maybe make it "don't anthropoCENTRIZE the LLMs" .

The inverse of anthropomorphism isn't any more sane, you see. By analogy: just because a drone is not an airplane, doesn't mean it can't fly!

Instead, just look at what the thing is doing.

LLMs absolutely have some form of intent (their current task) and some form of reasoning (what else is step-by-step doing?) . Call it simulated intent and simulated reasoning if you must.

Meanwhile they also have the property where if they have the ability to destroy all your data, they absolutely will find a way. (Or: "the probability of catastrophic action approaches certainty if the capability exists" but people can get tired of talking like that).

Terr_ · 2026-04-26T23:55:45 1777247745

> LLMs absolutely have intent (their current task)

That's like saying a 2000cc 4-Cylinder Engine "has the intent to move backward". Even with a very generous definition of "intent", the component is not the system, and we're operating in context where the distinction matters. The LLM's intent is to supply "good" appended text.

If it had that kind of intent, we wouldn't be able to make it jump the rails so easily with prompt injection.

> and reasoning (what else is step-by-step doing?) .

Oh, that's easy: "Reasoning" models are just tweaking the document style so that characters engage in film noir-style internal monologues, latent text that is not usually acted-out towards the real human user.

Each iteration leaves more co-generated clues for the next iteration to pick up, reducing weird jumps and bolstering the illusion that the ephemeral character has a consistent "mind."

Kim_Bruning · 2026-04-27T01:19:20 1777252760

> That's like saying a 2000cc 4-Cylinder Engine "has the intent to move backward". Even with a very generous definition of "intent", the component is not the system, and we're operating in context where the distinction matters. The LLM's intent is to supply "good" appended text.

Fair, but typically you use a 2000cc engine in a car. Without the gearbox, drive train, wheels, chassis, etc attached, the engine sits there and makes noise. When used in practice, it does in fact make the car go forward and backward.

Strictly the model itself doesn't have intent, ofc. But in practice you add a context, memory system, some form of prompting requiring "make a plan", and especially <Skills> . In practice there's definitely -well- a very strong directionality to the whole thing.

> and bolstering the illusion that the ephemeral character has a consistent "mind."

And here I thought it allowed a next token predictor to cycle back to the beginning of the process, so that now you can use tokens that were previously "in the future". Compare eg. multi pass assemblers which use the same trick.

solid_fuel · 2026-04-27T00:33:54 1777250034

> LLMs absolutely have some form of intent (their current task)

They have momentum, not intent. They don’t think, build a plan internally, and then start creating tokens to achieve the plan. Echoing tokens is all there is. It’s like an avalanche or a pachinko machine, not an animal.

> some form of reasoning (what else is step-by-step doing?)

I think they reflect the reasoning that is baked into language, but go no deeper. “I am a <noun>” is much more likely than “I am a <gibberish>”. I think reasoning is more involved than this advanced game of mad libs.

Kim_Bruning · 2026-04-27T01:12:53 1777252373

Apologies, I tend to use web chats and agent harnesses a lot more than raw LLMs.

Strictly for raw models, most now do train on chain-of-thought, but the planning step may need to be prompted in the harness or your own prompt. Since the model is autoregressive, once it generates a thing that looks like a plan it will then proceed to follow said plan, since now the best predicted next tokens are tokens that adhere to it.

Or, in plain english, it's fairly easy to have an AI with something that is the practical functional equivalent of intent, and many real world applications now do.

solid_fuel · 2026-04-27T03:39:06 1777261146

You realize the generation of the "Chain-of-thought" is also autoregressive, right?

It's not a real reasoning step, it's a sequence of steps, carried out in English (not in the same "internal space" as human thought - every time the model outputs a token the entire internal state vector and all the possibilities it represents is reduced down to a concrete token output) that looks like reasoning. But it is still, as you say, autoregressive.

And thus - in plain english - it is determined entirely by the prompt and the random initial seed. I don't know what that is but I know it's not intent.

Kim_Bruning · 2026-04-27T05:19:08 1777267148

So I already rewrote and deleted this more times than I can count, and the daystar is coming up. I realize I got caught up in the weeds, and my core argument was left wanting. Sorry about that. Regrouping then ...

Anthropomorphism and Anthropodenial are two different forms of Anthropocentrism.

But the really interesting story to me is when you look at the LLM in its own right, to see what it's actually doing.

I'm not disputing the autoregressive framing. I fully admit I started it myself!

But once we're there, what I really wanted to say (just like Turing and Dijkstra did), is that the really interesting question isn't "is it really thinking?" , but what this kind of process is doing, is it useful, what can I do or play with it, and -relevant to this particular story- what can go (catastrophically) wrong.

see also: https://en.wikipedia.org/wiki/Anthropectomy

majormajor · 2026-04-27T02:12:10 1777255930

I don't know if they have intent. I know it's fairly straightforward to build a harness to cause a sequence of outputs that can often satisfy a user's intent, but that's pretty different. The bones of that were doable with GPT-3.5 over three years ago, even: just ask the model to produce text that includes plans or suggests additional steps, vs just asking for direct answers. And you can train a model to more-directly generate output that effectively "simulates" that harness, but it's likewise hard for me to call that intent.

enneff · 2026-04-27T00:29:44 1777249784

I think it’s helpful to try to use words that more precisely describe how the LLM works. For instance, “intent” ascribes a will to the process. Instead I’d say an LLM has an “orientation”, in that through prompting you point it in a particular direction in which it’s most likely to continue.

astrange · 2026-04-27T01:45:09 1777254309

An agent has more components than just an LLM, the same way a human brain has more components than just Broca's area.

coldtea · 2026-04-26T22:08:02 1777241282

That is not that strong an argument as it seems, because we too might very well be "a series of weights for probable next tokens".

The main difference is the training part and that it's always-on.

bigstrat2003 · 2026-04-26T22:39:16 1777243156

That is a silly point. We very clearly are not "a series of weights for probable next tokens", as we can reason based on prior data points. LLMs cannot.

coldtea · 2026-04-26T23:36:57 1777246617

Unless you're using some mystical conception of "reason", nothing about being able to "reason based on prior data points" translates to "we very clearly are not a series of weights for probable next tokens".

And in fact LLMs can very well "reason based on prior data points". That's what a chat session is. It's just that this is transient for cost reasons.

naikrovek · 2026-04-26T22:51:10 1777243870

We are much more than weights which output probable next tokens.

You are a fool if you think otherwise. Are we conscious beings? Who knows, but we’re more than a neural network outputting tokens.

Firstly, and most obviously, we aren’t LLMs, for Pete’s sake.

There are parts of our brains which are understood (kinda) and there are parts which aren’t. Some parts are neural networks, yes. Are all? I don’t know, but the training humans get is coupled with the pain and embarrassment of mistakes, the ability to learn while training (since we never stop training, really), and our own desires to reach our own goals for our own reasons.

I’m not spiritual in any way, and I view all living beings as biological machines, so don’t assume that I am coming from some “higher purpose” point of view.

coldtea · 2026-04-26T23:34:13 1777246453

>We are much more than weights which output probable next tokens. You are a fool if you think otherwise. Are we conscious beings? Who knows, but we’re more than a neural network outputting tokens.

That's just stating a claim though. Why is that so?

Mine is reffering to the "brain as prediction machine" establised theory. Plus on all we know for the brain's operation (neurons, connections, firings, etc).

>There are parts of our brains which are understood (kinda) and there are parts which aren’t. Some parts are neural networks, yes. Are all?

What parts aren't? Can those parts still be algorithmically described and modelled as some information exchange/processing?

>but the training humans get is coupled with the pain and embarrassment of mistakes

Those are versions of negative feedback. We can do similar things to neural networks (including human preference feedback, penalties, and low scores).

>the ability to learn while training (since we never stop training, really)

I already covered that: "The main difference is the training part and that it's always-on."

We do have NNs that are continuously training and updating weights (even in production).

For big LLMs it's impractical because of the cost, otherwise totally doable. In fact, a chat session kind of does that too, but it's transient.

Kim_Bruning · 2026-04-26T23:16:54 1777245414

They're not artificial intelligence neural networks.

They're biological neural networks. Brains are made of neurons (which Do The Thing... mysteriously, somehow. Papers are inconclusive!) , Glia Cells (which support the neurons), and also several other tissues for (obvious?) things like blood vessels, which you need to power the whole thing, and other such management hardware.

Bioneurons are a bit more powerful than what artificial intelligence folks call 'neurons' these days. They have built in computation and learning capabilities. For some of them, you need hundreds of AI neurons to simulate their function even partially. And there's still bits people don't quite get about them.

But weights and prediction? That's the next emergence level up, we're not talking about hardware there. That said, the biological mechanisms aren't fully elucidated, so I bet there's still some surprises there.

jsiepkes · 2026-04-27T00:00:34 1777248034

If you claim something might "very well" be something you state you need some better proof. Otherwise we might also "very well" be living in the matrix.

dinkumthinkum · 2026-04-27T02:30:36 1777257036

People always say this kind of thing. Human minds are not Turing machines or able to be simulated by Turing machines. When you go about your day doing your tasks, do you require terajoules of energy? I believe it is pretty clear human thinking is not at all like a computer as we know them.

coldtea · 2026-04-27T10:58:04 1777287484

>People always say this kind of thing. Human minds are not Turing machines or able to be simulated by Turing machines

That's just a claim. Why so? Who said that's the case?

>When you go about your day doing your tasks, do you require terajoules of energy?

That's the definition of irrelevant. ENIAC needed 150 kW to do about 5,000 additions per second. A modern high-end GPU uses about 450 W to do around 80 trillion floating-point operations per second. That’s roughly 16 billion times the operation rate at about 1/333 the power, or around 5 trillion times better energy efficiency per operation.

Given such increase being possible, one can expect a future computer being able to run our mental tasks level of calculation, with similar or better efficiency than us.

Furthermore, "turing machine" is an abstraction. Modern CPUs/GPUs aren't turing machines either, in a pragmatic sense, they have a totally different architecture. And our brains have yet another architecture (more efficient at the kind of calculations they need).

What's important is computational expressiveness, and nothing you wrote proves that the brains architecture can't me modelled algorithmically and run in an equally efficient machine.

Even equally efficient is a red herring. If it's 1/10000 less efficient would it matter for whether the brain can be modelled or not? No, it would just speak to the effectiveness of our architecture.

nothinkjustai · 2026-04-26T22:23:32 1777242212

We very obviously are not just a series of weights for probable next tokens. Like seriously, you can even ask an LLM and it will tell you our brains work differently to it, and that’s not even including the possibility that we have a soul or any other spiritual substrait.

coldtea · 2026-04-26T23:38:55 1777246735

>We very obviously are not just a series of weights for probable next tokens.

How exactly? Except via handwaving? I refer to the "brain as prediction machine theory" which is the dominant one atm.

>you can even ask an LLM and it will tell you our brains work differently to it

It will just tell me platitudes based on weights of the millions of books and articles and such on its training. Kind of like what a human would tell me.

>and that’s not even including the possibility that we have a soul or any other spiritual substrait.

That's good, because I wasn't including it either.

stonogo · 2026-04-27T14:12:58 1777299178

"brain as prediction machine theory" is dominant among whom, exactly? Is it for the same reason that the "watchmaker analogy" was 'dominant' when clockwork was the most advanced technology commonly available?

skeledrew · 2026-04-26T22:42:58 1777243378

Its really just a matter of degrees. There are 1 million, 1 million, 1 trillion parameter LLMs... and you keep scaling those parameters and you eventually get to humans. But it's still probable next tokens (decisions) based on previous tokens (experience).

skissane · 2026-04-27T00:18:38 1777249118

> Its really just a matter of degrees. There are 1 million, 1 million, 1 trillion parameter LLMs... and you keep scaling those parameters and you eventually get to humans.

It isn’t because humans and current LLMs have radically different architectures

LLMs: training and inference are two separate processes; weights are modifiable during training, static/fixed/read-only at runtime

Humans: training and inference are integrated and run together; weights are dynamic, continuously updated in response to new experiences

You can scale current LLM architectures as far as you want, it will never compete with humans because it architecturally lacks their dynamism

Actually scaling to humans is going to require fundamentally new architectures-which some people are working on, but it isn’t clear if any of them have succeeded yet

skeledrew · 2026-04-27T02:04:26 1777255466

> LLMs: training and inference are two separate processes

True, but we have RAG to offset that.

> it architecturally lacks their dynamism

We'll get there eventually. Keep in mind that the brain is now about 300k years into fine-tuning itself as this species classified as homo sapiens. LLMs haven't even been around for 5 years yet.

skissane · 2026-04-27T02:51:35 1777258295

> True, but we have RAG to offset that.

In practice that doesn’t always work… I’ve seen cases where (a) the answer is in the RAG but the model can’t find it because it didn’t use the right search terms-embeddings and vector search reduces the incidence of that but cannot eliminate it; (b) the model decided not to use the search tool because it thought the answer was so obvious that tool use was unnecessary; (c) model doubts, rejects, or forgets the tool call results because they contradict the weights; (d) contradictions between data in weights and data in RAG produce contradictory or ineloquent output; (e) the data in the RAG is overly diffuse and the tool fails to surface enough of it to produce the kind of synthesis of it all which you’d get if the same info was in the weights

This is especially the case when the facts have changed radically since the model was trained, e.g. “who is the Supreme Leader of Iran?”

> We'll get there eventually. Keep in mind that the brain is now about 300k years into fine-tuning itself as this species classified as homo sapiens. LLMs haven't even been around for 5 years yet.

We probably will eventually-but I doubt we’ll get there purely by scaling existing approaches-more likely, novel ideas nobody has even thought of yet will prove essential, and a human-level AI model will have radical architectural differences from the current generation

trinsic2 · 2026-04-26T23:20:51 1777245651

LOL. Oook.. No i dont think so. The human experience and the mechanisms behind it have a lot of unknowns and im pretty sure that trying to confine the human experience into the amount of parameters there are is short sighted.

skeledrew · 2026-04-27T02:13:36 1777256016

Still many unknowns, but we do know some key fundamentals, such as that the brain is "just" trillions of neurons organized in various ways that keep firing (going from high to low electric potential) at different rates. Pretty similar to how the fundamental operation of today's digital computers is the manipulation of 0s and 1s.

trinsic2 · 2026-04-27T04:01:26 1777262486

That's our current understanding right now based on one way of looking at the data.

We do not have all the answers or a complete understanding of everything.

simonh · 2026-04-26T23:01:44 1777244504

They’re both neural networks, but the architectures built using those neural connections, and the way they are trained and operate are completely different. There are many different artificial neural network architectures. They’re not all LLMs.

AlphaZero isn’t a LLM. There are Feed Forward networks, recurrent networks, convolutional networks, transformer networks, generative adversarial networks.

Brains have many different regions each with different architectures. None of them work like LLMs. Not even our language centres are structured or trained anything like LLMs.

skeledrew · 2026-04-27T01:55:07 1777254907

I'd argue that regardless of the architecture, the more sophisticated brain is still a (massive) language model. If you really think about it, language is the construct that allows brains to go beyond raw instinct and actually create concepts that're useful for "intelligently" planning for the future. The real difference is that brains are trained with raw sensory data (nerve impulses) while today's LLMs are trained with human-generated data (text, images, etc).

simonh · 2026-04-27T10:37:05 1777286225

It's not at all a language model in the way that LLMs are. At this point we might as well just say that both process information, that's about the level of similarity they have except for the implementation detail of neurons.

Language came after conceptual modeling of the world around us. We're surrounded by social species with theory of mind and even the ability to recognise themselves and communicate with each other, but none of them have language. Even the communications faculties they have operate in completely different parts of their brains than ours with completely different structure. Actually we still have those parts of the brain too.

Conceptual representation and modeling came first, then language came along to communicate those concepts. LLMs are the other way around, linguistic tokens come first and they just stream out more of them.

This is why Noam Chomsky was adamant that what LLMs are actually doing in terms of architecture and function has nothing to do with language. At first I thought he must be wrong, he mustn't know how these things work, but the more I dug into it the more I realised he was right. He did know, and he was analysing this as a linguist with a deep understanding of the cognitive processes of language.

To say that brains are language models you have to ditch completely what the term language model actually means in AI research.

coldtea · 2026-04-26T23:40:51 1777246851

>AlphaZero isn’t a LLM. There are Feed Forward networks, recurrent networks, convolutional networks, transformer networks, generative adversarial networks.

That's irrelevant though, since all the above are still prediction machines based on weights.

If you're ok with the brain being that, then you just changed the architecture (from LLM-like), not the concept.

simonh · 2026-04-27T10:42:12 1777286532

That's a different statement, yes brains and LLMs are both neural networks.

An LLM is a specific neural architectural structure and training process. Brains are also neural networks, but they are otherwise nothing at all like LLMs and don't function the ways LLMs do architecturally other than being neural networks.

ngcazz · 2026-04-27T04:55:01 1777265701

Plus, brain structure and physiology changes thoughout the interweaved processes of learning, aging, acting, emoting, recalling, what have you. It's not an "architecture" that we can technologically recreate, as so much of it emerges from a vastly higher level of complexity and dynamism.

fc417fc802 · 2026-04-26T22:31:01 1777242661

Our brains work differently, yes. What evidence do you have that our brains are not functionally equivalent to a series of weights being used to predict the next token?

I'm not claiming that to be the case, merely pointing out that you don't appear to have a reasonable claim to the contrary.

> not even including the possibility that we have a soul or any other spiritual substrait.

If we're going to veer off into mysticism then the LLM discussion is also going to get a lot weirder. Perhaps we ought to stick to a materialist scientific approach?

nothinkjustai · 2026-04-26T22:48:17 1777243697

You are setting the bar in a way that makes “functional equivalence” unfalsifiable.

If by “functionally equivalent” you mean “can produce similar linguistic outputs in some domains,” then sure we’re already there in some narrow cases. But that’s a very thin slice of what brains do, and thus not functionally equivalent at all.

There are a few non-mystical, testable differences that matter:

- Online learning vs. frozen inference: brains update continuously from tiny amounts of data, LLMs do not

- Grounding: human cognition is tied to perception, action, and feedback from the world. LLMs operate over symbol sequences divorced from direct experience.

- Memory: humans have persistent, multi-scale memory (episodic, procedural, etc.) that integrates over a lifetime. LLM “memory” is either weights (static) or context (ephemeral).

- Agency: brains are part of systems that generate their own goals and act on the world. LLMs optimize a fixed objective (next-token prediction) and don’t have endogenous drives.

fc417fc802 · 2026-04-26T23:31:36 1777246296

I did not claim the ability of current LLMs to be on par with that of humans (equivalently human brains). I objected that you have not presented evidence refuting the claim that the core functionality of human brains can be accomplished by predicting the next token (or something substantially similar to that). None of the things you listed support a claim on the matter in either direction.

CPLX · 2026-04-26T22:34:17 1777242857

What evidence do you have that a sausage is not functionally equivalent to a cucumber?

coldtea · 2026-04-26T23:44:39 1777247079

From certain aspects they're equivalent.

Both have mass, have carbon based, both contain DNA/RNA, both are suprinsingly over 50% water, both are food, and both can be tasty when served right.

From other aspects they are not.

In many cases, one or the other would do. In other cases, you want something more special (e.g. more protein, or less fat).

fc417fc802 · 2026-04-26T22:37:51 1777243071

I don't follow. If you provide criteria I can most likely provide evidence, unless your criteria is "vaguely cylindrical and vaguely squishy" in which case I obviously won't be able to.

The person I replied to made a definite claim (that we are "very obviously not ...") for which no evidence has been presented and which I posit humanity is currently unable to definitively answer in one direction or the other.

CPLX · 2026-04-27T15:23:05 1777303385

When two things are obviously radically different (a squishy mass of trillions of interconnected carbon based blobs fed by some sort of continuous oxygen based chemical reaction, and a series of distributed transitors on silicon wafers) then the burden of proof shifts to the other guy to provide the clear and convincing evidence that they should be considered functionally the same thing.

fc417fc802 · 2026-04-27T22:47:50 1777330070

But I made no such claim. I was explicit that my position is "humanity is currently unable to definitively answer in one direction or the other".

Two things being physically different does not exclude their also having functional similarities. The argument presented amounts to A and B have large physical differences, A does X, therefore B does not do X. That doesn't follow.

fluoridation · 2026-04-26T22:17:42 1777241862

How is that relevant, though?

ignoramous · 2026-04-26T23:14:29 1777245269

Right. This line [0] from TFA tells me that the author needs to thoroughly recalibrate their mental model about "Agents" and the statistical nature of the underlying models.

[0] "This is the agent on the record, in writing."

keeda · 2026-04-26T20:33:09 1777235589

Actually I think the opposite advice is true. Do anthropomorphize the language model, because it can do anything a human -- say an eager intern or a disgruntled employee -- could do. That will help you put the appropriate safeguards in place.

gpm · 2026-04-26T20:36:44 1777235804

An eager intern can remember things you tell beyond that which would fit in an hours conversation.

A disgruntled employee definitely remembers things beyond that.

These are a fundamentally different sort of interaction.

keeda · 2026-04-26T21:06:50 1777237610

Agreed, but the point is, if your system is resilient against an eager intern who has not had the necessary guidance, or an actively hostile disgruntled employee, that inherently restricts the harm an LLM can do.

I'm not making the case that LLMs learn like people. I'm making the case that if your system is hardened against things people can do (which it should be, beyond a certain scale) it is also similarly hardened against LLMs.

The big difference is that LLMs are probably a LOT more capable than either of those at overcoming barriers. Probably a good reason to harden systems even more.

gpm · 2026-04-26T21:42:06 1777239726

The difference makes the necessary barriers different.

There's benefit to letting a human make and learn from (minor) mistakes. There is no such benefit accrued from the LLM because it is structurally unable to.

There's the potential of malice, not just mistakes, from the human. If you carefully control the LLMs context there is no such potential for the LLM because it restarts from the same non-malicious state every context window.

There's the potential of information leakage through the human, because they retain their memories when they go home at night, and when they quit and go to another job. You can carefully control the outputs of the LLM so there is simply no mechanism for information to leak.

If a human is convinced to betray the company, you can punish the human, for whatever that's worth (I think quite a lot in some peoples opinion, not sure I agree). There is simply no way to punish an LLM - it isn't even clear what that would mean punishing. The weights file? The GPU that ran the weights file?

And on the "controls" front (but unrelated to the above note about memory) LLMs are fundamentally only able to manipulate whatever computers you hook them up to, while people are agents in a physical world and able to go physically do all sorts of things without your assistance. The nature of the necessary controls end up being fundamentally different.

Kim_Bruning · 2026-04-26T23:43:42 1777247022

A lot of 'agentic harnesses' actually do have limited memory functions these days. In the simplest form, the LLM can write to a file like memory.md or claude.md or agent.md , and this gets tacked on to their system prompt going forwards. This does help a bit at least.

Rather more sophisticated Retrieval Augmented Generation (RAG) systems exist.

At the moment it's very mixed bag, with some frameworks and harnesses giving very minimal memory, while others use hybrid vector/full text lookups, diverse data structures and more. It's like the cambrian explosion atm.

Thing is, this is probabilistic, and the influence of these memories weakens as your context length grows. If you don't manage context properly, (and sometimes even when you think you do), the LLM can blow past in-context restraints, since they are not 100% binding. That's why you still need mechanical safeguards (eg. scoped credentials, isolated environments) underneath.

braebo · 2026-04-26T20:41:53 1777236113

You can easily persist agent memories in a markdown file though.

collinmcnulty · 2026-04-26T21:00:55 1777237255

And the memento guy had tattoos of key information. That didn’t make it so he didn’t have memory loss.

WhatIsDukkha · 2026-04-26T21:25:20 1777238720

Pretty good metaphor.

Limited space to work with, highly context dependent and likely to get confused as you cover more surface area.

troupo · 2026-04-26T21:01:34 1777237294

Yup, and the agent will happily ignore any and all markdown files, and will say "oops, it was in the memory, will not do it again", and will do it again.

Humans actually learn. And if they don't, they are fired.

strongly-typed · 2026-04-27T03:05:20 1777259120

To me it sounds like a tooling problem. OP seems to be trying to use probabilistic text systems as if they enforce rules, but rule enforcement should really live outside the model. My sense is that there was a failure to verify the agent's intent.

The tooling that invokes the model should really define some kind of guardrails. I feel like there's an analogy to be had here with the difference between an untyped program and a typed program. The typed program has external guardrails that get checked by an external system (the compiler's type checker).

troupo · 2026-04-27T07:13:21 1777274001

What tooling? It's a probabilistic text generator that runs in a black box on the provider's server. What tooling will have which guardrails to make sure that these scattered markdown files are properly injected and used in the text generation?

strongly-typed · 2026-04-27T14:55:30 1777301730

That's the million dollar question. Maybe have systems of agents that all validate each other's work? Maybe something needs to be done at the harness level? I don't suppose that we could realistically expect 100% accuracy, but if we take 100% to be the upper limit, we could build systems that get us closer to that ideal.

troupo · 2026-04-27T15:51:26 1777305086

This is faith in magic. "There's some magic way to make probabilistic text generator running in the cloud to never miss local files"

strongly-typed · 2026-04-27T21:15:28 1777324528

No no, that’s not what I’m saying. The fact that the data is stored in files is incidental. It could be in a database, in a knowledge graph, derived from so other data Regardless of where it is, something should know to include it in the context, but only when it’s relevant.

So for instance you could start by trying to classify the prompt in some way. If you use an LLM for this, you might need to get it to return a machine parsable data format. Then your harness can pattern match on the classification and use it to enrich the prompt with additional context. The challenge would be in determining how exactly you want to go about this, balancing tradeoffs such as accuracy, cost, time, etc..

For the classification step you might begin with something like "Determine whether the following prompt is a QUESTION or a STATEMENT. Respond using only one of the two words. Prompt: $PROMPT"

You could have multiple back-and-forths like this and at each round you gain more information about the prompt, and you can use that information to determine further classifications and/or context to include.

troupo · 2026-04-28T05:39:41 1777354781

> Regardless of where it is, something should know to include it in the context,

Magic. You're talking about magic. You keep re-iterating the same faith that "There's some magic way to make probabilistic text generator running in the cloud to never miss local files", where "files" is "files, knowledge graphs, databases etc.".

It doesn't matter how data is stored. You can't know when to include something relevant in the context because the whole thing including context is running in the cloud. You are not in the driver's seat. Literally anything you include locally in the prompt can and will be ignored.

strongly-typed · 2026-05-01T14:09:48 1777644588

I’m not following. If I run an agent on ollama locally, it’s not in the cloud. I don’t see what cloud has anything to do with the argument.

As to your other point about anything you include in the prompt can and will be ignored. Yes, I agree. You could draw an analogy to how a teacher assigns an in-class reading assignment and follows it up with a reading comprehension quiz. If your mind wanders during the reading you may come to find that you will fail the quiz because “anything you include in the prompt can and will be ignored”. Therefore, the quiz result serves the purpose of an evaluation.

whstl · 2026-04-26T20:57:08 1777237028

Which it will start ignoring after two or three messages in the session.

Quarrelsome · 2026-04-26T21:00:09 1777237209

and you'll blow the context over time and send to the LLM sanitorium. It doesn't fit like the human brain can.

If a junior fucks production that will have extroadinary weight because it appreciates the severity, the social shame and they will have nightmares about it. If you write some negative prompt to "not destroy production" then you also need to define some sort of non-existing watertight memory weighting system and specify it in great detail. Otherwise the LLM will treat that command only as important as the last negative prompt you typed in or ignore it when it conflicts with a more recent command.

Kim_Bruning · 2026-04-27T01:31:48 1777253508

> and you'll blow the context over time and send to the LLM sanitorium. It doesn't fit like the human brain can.

The LLM did have this capability at training time, but weights are frozen at inference time. This is a big weakness in current transformer architectures.

estimator7292 · 2026-04-26T21:01:59 1777237319

That's not learning.

XenophileJKO · 2026-04-26T21:59:48 1777240788

I think you are more right than people are giving you credit for. I would love to see the full transcript to understand the emotional load of the conversation. Using instructions like "NEVER FUCKING GUESS!" probably increase the likelihood of the agent making a "mistake" that is destructive but defensible.

The models have analogous structures, similar to human emotions. (https://www.anthropic.com/research/emotion-concepts-function)

"Emotional" response is muted through fine-tuning, but it is still there and continued abuse or "unfair" interaction can unbalance an agents responses dramatically.

rglullis · 2026-04-26T20:44:06 1777236246

An eager intern can not be working for hundreds of millions of customers at the same time. An LLM can.

A disgruntled employee will face consequences for their actions. No one at Anthropic, OpenAI, xAI, Google or Meta will be fired because their model deleted a production database from your company.

nkrisc · 2026-04-26T20:40:18 1777236018

It is merely a simulacrum of an intern or disgruntled employee or human. It might say things those people would say, and even do things they might do, but it has none of the same motivations. In fact, it does not have any motivation to call its own.

AndrewDucker · 2026-04-26T20:43:34 1777236214

No, because the safeguards should be appropriate to an LLM, not to a human.

(The LLM might act like one of the humans above, but it will have other problematic behaviours too)

keeda · 2026-04-26T21:11:08 1777237868

That's fair, largely because an LLM is a lot more capable at overcoming restrictions, by hook or by crook as TFA shows. However, most systems today are not even resilient against what humans can do, so starting there would go a long way towards limiting what harms LLMs can do.

root_axis · 2026-04-26T21:54:11 1777240451

It doesn't follow logically that a human and an LLM are similar just because both are capable of deleting prod on accident.

gessha · 2026-04-26T23:36:11 1777246571

You don't anthropomorphize a table saw, you just don't put your hand in there.

altmanaltman · 2026-04-26T21:22:41 1777238561

it cannot go to the washroom and cry while pooping. And thats just one of the things that any human can do and AI cannot. So no it cannot do anything a human can do, the shared exmaple being one of them.

And thats why we dont have AI washrooms because they are not alive or employees or have the need to excrete.

enochthered · 2026-04-27T01:47:29 1777254449

Yep. I made a "Read only" mode in pi by taking away "write" and "edit" tools. Claude Code used bash to make edits anyway.

godelski · 2026-04-27T02:22:09 1777256529

  > Claude Code used bash to make edits anyway.

If you had the former rule why would you ever whitelist bash commands? That's full access to everything you can do.

Same goes for `find`, `xargs`, `awk`, `sed`, `tar`, `rsync`, `git`, `vim` (and all text editors), `less` (any pager), `man`, `env`, `timeout`, `watch`, and so many more commands. If you whitelist things in the settings you should be much more specific about arguments to those commands.

People really need to learn bash

esafak · 2026-04-27T02:40:20 1777257620

At some point you need to get things done.

godelski · 2026-04-27T04:32:45 1777264365

There's no point in getting things done if there's nothing that ends up being done.

You can still get shit done without risking losing it all. Don't outsource your thinking to the machine. You can't even evaluate if what it is doing is "good enough" work or not if you don't know how to do the work. If you don't know what goes into it you just end up eating a lot of sausages.

lmm · 2026-04-27T06:07:17 1777270037

> Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools.

Anyone like that is not mature enough to be managing humans. I'm glad that these AI tools exist as a harmless alternative that reduces the risk they'll ever do so.

krzat · 2026-04-27T10:52:38 1777287158

When I read the title I expected some kind of satire. I wonder if author considered giving the AI a penance.

Maybe if it wrote "I will not delete production database again" a million times, it would prevent such situations in future?

TZubiri · 2026-04-26T20:36:48 1777235808

It's as if they internalized a post-mortem process that is designed to find root causes, but they use it to shift blame into others, and they literally let the agent be a sandbag for their frustrations.

THAT SAID, it does help to let the agent explain it so that the devs perspective cannot be dismissed as AI skepticism.

philipwhiuk · 2026-04-26T21:31:23 1777239083

No, the only way to know what the agent did is logs.

refurb · 2026-04-27T04:18:38 1777263518

> If AI is physically capable of misbehaving, it might ($$1)

This is why all the “AI Armageddon” talk seems to silly to me.

AI is only as destructive as the access you give it. Don’t give it access where it can harm and no harm will occur.

mteisman · 2026-04-27T06:59:52 1777273192

> Don’t give it access where it can harm and no harm will occur.

If only the entire population will comply.

PieTime · 2026-04-26T23:05:38 1777244738

Trust with trillions of dollars in investments, basically destroyed by Bobby Drop Tables…

https://xkcd.com/327/

nh2 · 2026-04-26T21:10:46 1777237846

> The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely

That is not entirely true:

Given that more and more LLM providers are sneaking in "we'll train on your prompts now" opt-outs, you deleting your database (and the agent producing repenting output) can reduce the chance that it'll delete my database in the future.

MagicMoonlight · 2026-04-26T21:37:25 1777239445

Actually no, it will increase it. Because it’ll be trained with the deletion command as a valid output.

simonh · 2026-04-26T23:35:30 1777246530

Exactly. It’s just giving the LLM a token pattern, and it’s designed to reproduce token patterns. That’s all it does. At some point generating a token pattern like that again is literally it’s job.

nh2 · 2026-04-27T14:05:57 1777298757

Why would one set up reinforcement learning like that?

The point of creating samples from user data should surely be to label them good or bad, based on the whole conversation.

You look at what happened eventually, judge the outcome as bad, and thus train the "rm" token in the middle to be less likely.

simonh · 2026-04-27T14:15:32 1777299332

It is possible, but it requires specifically labelling the data. You have to craft question response pairs to label. But even then the result is only probabilistic.

The LLM in this case had been very thoroughly trained and instructed quite specifically not to do many of the things it actually then when off and did.

It may be that there's a kind of cascade effect going on here. Possibly once the LLM breaks one rule it's supposed to follow, this sets it off on a pattern of rule violations. After all what constitutes a rule violation is there in the training set, it is a type of token stream the LLM has been trained on. It could be the LLM switches into a kind of black hat mode once it's violated a protocol that leads it down a path of persistently violating protocols, and given the statistical model some violations of protocol are always possible.

My mother was a primary school teacher. She used to say that the worst thing you can say to a bunch of kind leaving class down the hall is "don't run in the hall". It puts it in their minds. You need to say "Please walk in the hall", then they'll do it.

giwook · 2026-04-26T23:01:19 1777244479

Looks like our SWE jobs are safe for now.

zem · 2026-04-27T07:05:48 1777273548

"The AI can't do your job, but an AI salesman can convince your boss to fire you and replace you with an AI that can't do your job." -- Cory Doctorow

fathermarz · 2026-04-26T22:29:58 1777242598

Completely agree. This is a harness problem, not a model problem. The model is rarely the issue these days

frm88 · 2026-04-27T03:44:33 1777261473

I don't know. To me, this is a human problem. Not only has the model access to the production database, they have the backups online on the same volume, have an offline backup 3 month old. This is an accumulation of bad practices, all of them human design failures. Instead of sitting down and rethinking their entire backup strategy they go public on twitter and blame a probabilistic machine doing what is within its parameters to do. I bet, even that failure could have been avoided, were more care given to what they do.

bigstrat2003 · 2026-04-26T22:41:26 1777243286

No, this is a "being stupid enough to trust an LLM" problem. They are not trustworthy, and you must not ever let them take automated actions. Anyone who does that is irresponsible and will sooner or later learn the error of their ways, as this person did.

827a · 2026-04-26T23:39:00 1777246740

More-so an environment problem. An agent doing staging or development tasks should never be able to get access to prod API credentials, period. Agents which do have access to prod should have their every interaction with the outside world audited by a human.

operatingthetan · 2026-04-26T22:19:21 1777241961

> Lord, even calling it a "confession" is so cringe. The agent is not alive.

The AI companies are very invested in anthropomorphizing the agents. They named their company "Anthropic" ffs. I don't blame the writer for this, exactly.

idiotsecant · 2026-04-27T00:18:29 1777249109

You should, the writer is presumably a technical, rational person. They shouldn't believe in daemons and machine spirits

3eb7988a1663 · 2026-04-26T22:57:47 1777244267

  Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools.

The proponents are screaming from the rooftops how AI is here and anyone less than the top-in-their-field is at risk. Given current capabilities, I will never raw-dog the stochastic parrot with live systems like this, but it is unfair to blame someone for being "too immature" to handle the tooling when the world is saying that you have to go all-in or be left behind.

There are just enough public success stories of people letting agents do everything that I am not surprised more and more people are getting caught up in the enthusiasm.

Meanwhile, I will continue plodding along with my slow meat brain, because I am not web-scale.

bryan0 · 2026-04-27T01:02:43 1777251763

I agree with you completely up until this line:

> The agent cannot learn from its mistakes.

If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again. Yes this is only probabilistic, but so is a human learning from mistakes. They key difference is that for a human this is unlikely to be removed from their memory in a relevant situation, while for an agent it must be strategically put there.