Hacker Newsnew | past | comments | ask | show | jobs | submit | theden's commentslogin

For those that don't know, Erin Patterson (the mushroom murderer in Australia) allegedly used iNaturalist to find the poisonous mushrooms

https://www.abc.net.au/news/science/2025-07-10/inaturalist-d...

https://www.sydney.edu.au/news-opinion/news/2025/05/09/the-c...


I'm kinda shocked (yet not surprised) at how bad railway has been with this:

- Why were they making CDN changes in prod? With their 100M funding recently they could afford a separate env to test CDN changes. Did their engineering team even properly understand surrogate keys to feel confident to roll out a change in prod? I don't think they're beating the AI allegations to figure out CDN configs, a human would not be this confident to test surrogate keys in prod.

- During and post-incident, the comms has been terrible. Initial blog post buried the lede (and didn't even have Incident Report in the title). They only updated this after negative feedback from their customers. I still get the impression they're trying to minimise this, it's pretty dodgy. As other comments mentioned, the post is vague.

- They didn't immediately notify customers about the security incident (people learned from their users). The apparently have emailed affected customers only, many hours after. Some people that were affected that still haven't been emailed, and they seem to be radio silent lately.

- Their founder on twitter keeps using their growth as an excuse for their shoddy engineering, especially lately. Their uptime for what's supposed to be a serious production platform is abysmal, they've clearly prioritised pushing features over reliability https://status.railway.com/ and the issues I've outlined here have little to do with growth, and more to do with company culture.

Honestly, I don't think railway is cut out for real production work (let alone compliance deployments), at least nothing beyond hobby projects.

Their forum is also getting heated, customers have lost revenue, had medical data leaked etc., with no proper followup from the railway team

https://station.railway.com/questions/data-getting-cached-or...


I was affected and got no communication at all, had to find out from user reports and take immediate action with 0 signal from railway about the issue (even though they were already aware according to the timeline).

I've been trying to defend railway since we built our initial prototype there and I wanted to avoid the cost of migrating to some "serious infra" until proven needed, but they have been making their defense a really hard job (without mentioning that their overall reliability has been really bad the past weeks)


Yeah, this was really the nail in the coffin for us. Most services are already moved from Railway, but the rest will follow during this week.

Railway founder here, providing some color

> Why were they making CDN changes in prod? With their 100M funding recently they could afford a separate env to test CDN changes. Did their engineering team even properly understand surrogate keys to feel confident to roll out a change in prod? I don't think they're beating the AI allegations to figure out CDN configs, a human would not be this confident to test surrogate keys in prod.

We went deep on them, tested them prior, and then when rubber met road in production we ran into cases we didn't see in testing. The large issue, and mentioned in the blogpost, is that we didn't have a mechanism to to a staged release.

> During and post-incident, the comms has been terrible. Initial blog post buried the lede (and didn't even have Incident Report in the title). They only updated this after negative feedback from their customers. I still get the impression they're trying to minimise this, it's pretty dodgy. As other comments mentioned, the post is vague.

Our initial post definitely could have been more clear, and we revised it the moment we got customer feedback to do so.

> They didn't immediately notify customers about the security incident (people learned from their users). The apparently have emailed affected customers only, many hours after. Some people that were affected that still haven't been emailed, and they seem to be radio silent lately.

We notified customers even before we did a wide release, as is process for anything security related. You create space for as much disclosure area as possible, and then follow up with a public disclosure

> Their founder on twitter keeps using their growth as an excuse for their shoddy engineering, especially lately. Their uptime for what's supposed to be a serious production platform is abysmal, they've clearly prioritised pushing features over reliability https://status.railway.com/ and the issues I've outlined here have little to do with growth, and more to do with company culture.

Do you have any specifics here? We're scaling the system at 100x YoY growth right now, working 24/7 to scale the entire thing. Again, all ears on if you have specific crits as we're always open to receiving feedback on how we can do things better!

> Their forum is also getting heated, customers have lost revenue, had medical data leaked etc., with no proper followup from the railway team

There are team members in that thread linked, are you certain you linked the right thread? Happy to have a look at anything you believe we're missing!


I'm sorry, but there's a lot of spin here. Basically you guys handled this terribly, and your reliability has tanked recently, hence why customers that need reliability in production are leaving or have already migrated.

> We went deep on them, tested them prior, and then when rubber met road in production we ran into cases we didn't see in testing. The large issue, and mentioned in the blogpost, is that we didn't have a mechanism to to a staged release.

Honestly for a production-grade _platform_ company, that also does compliance (SOC2/3, HIPAA etc.), not having a staged release is negligent, and how you guys are handling this is a huge red flag. I've done such changes myself in production envs, for deployments that don't have the stakes you guys have. I'm normally more sympathetic on incidents, but the lack of transparency thus far from railway leaves me doubting more than anything.

> Our initial post definitely could have been more clear, and we revised it the moment we got customer feedback to do so.

Please read the room, there's still a lot of confusion about the blog post in this thread (https://news.ycombinator.com/item?id=47582295). The technical detail isn't there, we only know it about the surrogate keys from the status incident (https://status.railway.com/incident/X0Q39H56) which is not linked in the post. The blog post reads like PR compared to the initial incident status report, and the resolved timestamp does not match which is sloppy. Your little edit to the title only made it from a bad post to a slightly less bad post.

> We notified customers even before we did a wide release, as is process for anything security related. You create space for as much disclosure area as possible, and then follow up with a public disclosure

Emailing only affected users isn't working out, because affected people aren't yet emailed (I know one personally). Just check the post on your own forum (https://station.railway.com/questions/data-getting-cached-or... did you actually read it?) and see the list of people affected still not emailed, and left on read. You guy should email everyone, this is a security incident not a service interruption. There's a lot of loss trust by your customers now, i.e., if you guys can't figure out who to email, what else are you doing wrong?

> Do you have any specifics here? We're scaling the system at 100x YoY growth right now, working 24/7 to scale the entire thing. Again, all ears on if you have specific crits as we're always open to receiving feedback on how we can do things better!

https://x.com/JustJake/status/2038806338915152350

Again, it's not an excuse if you're a _platform_ company that customers pay a lot of money to be reliable. You can't just keep saying you're open to feedback and being transparent as vanity. There's plenty of feedback on here, your twitter, your forum, and feedback is people are telling you to focus on reliability, because railway keeps breaking their deployments. If you don't care about reliability and prefer to scale with features, be honest about it. Railway's poor uptime does not lie.

> There are team members in that thread linked, are you certain you linked the right thread? Happy to have a look at anything you believe we're missing!

Did you read the thread? Yes, only _one_ employee commented 5 hours after my HN comment. Still almost everyone left of read, unanswered questions etc.

By way that's only one forum post, there are many that are just ignored, one where a user mentioned they're reporting railway to ICO for a GDPR breach, rightfully.


Agreed 100%. So much downtime, constant minimising of situations. Can't be trusted. We are moving away from Railway.

> Honestly for a production-grade _platform_ company, that also does compliance (SOC2/3, HIPAA etc.), not having a staged release is negligent, and how you guys are handling this is a huge red flag. I've done such changes myself in production envs, for deployments that don't have the stakes you guys have. I'm normally more sympathetic on incidents, but the lack of transparency thus far from railway leaves me doubting more than anything.

We do indeed have a staging environment as mentioned previously. The issue arose in the rollout to production as mentioned previously.

> The blog post reads like PR compared to the initial incident status report, and the resolved timestamp does not match which is sloppy.

I've gone ahead and added the surrogate key mention into the post mortem. We initially got in trouble for having it be too technical centric and not enough on the user impact. It's a delicate balance; apologies. As I mention, we are open to critical feedback here.

> Emailing only affected users isn't working out, because affected people aren't yet emailed (I know one personally). Just check the post on your own forum (https://station.railway.com/questions/data-getting-cached-or... did you actually read it?) and see the list of people affected still not emailed, and left on read.

We have people working directly in that thread. For anybody who believes they were affected but not reached out to, we're working directly with them. We do take this very seriously. If you know someone here, please have them reach out either there or directly to me at jake@railway.com

> Again, it's not an excuse if you're a _platform_ company that customers pay a lot of money to be reliable. You can't just keep saying you're open to feedback and being transparent as vanity.

In the directly linked tweet I've mentioned that we're focusing on scaling the current system vs adding new features. We absolutely do need to do better on reliability, and my point is "Is there a specific poor engineering practice you're seeing here, or is it just based on reliability". Either is a fine crit we just want to make sure all our basis are covered

> Did you read the thread? Yes, only _one_ employee commented 5 hours after my HN comment. Still almost everyone left of read, unanswered questions etc.

Indeed I've read the thread, and we have people working it (you can see as of 8 hours ago).


> We do indeed have a staging environment as mentioned previously. The issue arose in the rollout to production as mentioned previously.

You may have misunderstood, I said staged release, i.e., I'm referencing the rollout

> I've gone ahead and added the surrogate key mention into the post mortem. We initially got in trouble for having it be too technical centric and not enough on the user impact. It's a delicate balance; apologies. As I mention, we are open to critical feedback here.

You can do both. If you have different audiences, have two separate posts and mutually link to redirect audiences. Ask your sec staff instead of relying on paying customers to give post-hoc feedback on your dodgy disclosure practices. If I have ping a platform company to correct and clarify info about their security disclosure, I'm out.


Still waiting on a reply and the logs so I can do forensics on this incident. IMO the response from Railway should have been: "all hands on deck, red alert, worst imaginable security breach for a PaaS". Not a small yellow alert popup about a CDN misconfiguration, and saying that all affected customers have been emailed, which is demonstrably not correct.

So the solution is to use a proprietary password manager instead? No thanks


This is a MUCH better solution https://wiki.archlinux.org/title/Systemd-creds


This happened to me and I finally ditched time machine for BorgBackup https://www.borgbackup.org/

Not as nice UI-wise, but at least it's stable


Vorta is a pretty nice GUI for borg on mac. Not as simple as Time Machine, but easier than creating launchctl entries.

https://vorta.borgbase.com



Kinda funny that a lot of devs accepted that LLMs are basically doing RCE on their machines, but instead of halting from using `--dangerously-skip-permissions` or similar bad ideas, we're finding workarounds to convince ourselves it's not that bad


Because we've judged it to be worth it!

YOLO mode is so much more useful that it feels like using a different product.

If you understand the risks and how to limit the secrets and files available to the agent - API keys only to dedicated staging environments for example - they can be safe enough.


Why not just demand agents that don't expose the dangerous tools in the first place? Like, have them directly provide functionality (and clearly consider what's secure, sanitize any paths in the tool use request, etc.) instead of punting to Bash?


Because it's impossible for fundamental reasons, period. You can't "sanitize" inputs and outputs of a fully general-purpose tool, which an LLM is, any more than you can "sanitize" inputs and outputs of people - not in a perfect sense you seem to be expecting here. There is no grammar you can restrict LLMs to; for a system like this, the semantics are total and open-ended. It's what makes them work.

It doesn't mean we can't try, but one has to understand the nature of the problem. Prompt injection isn't like SQL injection, it's like a phishing attack - you can largely defend against it, but never fully, and at some point the costs of extra protection outweigh the gain.


> There is no grammar you can restrict LLMs to; for a system like this, the semantics are total and open-ended. It's what makes them work.

You're missing the point.

An agent system consists of an LLM plus separate "agentive" software that can a) receive your input and forward it to the LLM; b) receive text output by the LLM in response to your prompt; c) ... do other stuff, all in a loop. The actual model can only ever output text.

No matter what text the LLM outputs, it is the agent program that actually runs commands. The program is responsible for taking the output and interpreting it as a request to "use a tool" (typically, as I understand it, by noticing that the LLM's output is JSON following a schema, and extracting command arguments etc. from it).

Prompt injection is a technique for getting the LLM to output text that is dangerous when interpreted by the agent system, for example, "tool use requests" that propose to run a malicious Bash command.

You can clearly see where the threat occurs if you implement your own agent, or just study the theory of that implementation, as described in previous HN submissions like https://news.ycombinator.com/item?id=46545620 and https://news.ycombinator.com/item?id=45840088 .


> propose to run a malicious Bash command

I am not sure it is reasonably possible to determine which Bash commands are malicious. This is especially so given the multitude of exploits latent in the systems & software to which Bash will have access in order to do its job.

It's tough to even define "malicious" in a general-purpose way here, given the risk tolerances and types of systems where agents run (e.g. dedicated, container, naked, etc.). A Bash command could be malicious if run naked on my laptop and totally fine if run on a dedicated machine.


You seem to be saying "I want all the benefits of YOLO mode without YOLO mode". You can just… use the normal mode if you want more security, it asks for permission for things.

> Prompt injection is a technique for getting the LLM to output text that is dangerous when interpreted by the agent system, for example, "tool use requests" that propose to run a malicious Bash command.

One of the things Claude can do is write its own tools, even its own programming languages. There's no fundamental way to make it impossible to run something dangerous, there is only trust.

It's remarkable that these models are now good enough that people can get away with trusting them like this. But, as Simon has himself said on other occasions, this is "normalisation of deviance". I'm rather the opposite: as I have minimal security experience but also have a few decades of watching news about corporations suffering leaks, I am absolutely not willing to run in YOLO mode at this point, even though I already have an entirely separate machine for claude with the bare minimum of other things logged in, to the extent that it's a separate github account specifically for untrusted devices.


Because if you give an agent Bash it can do anything they can be achieved by running commands in Bash, which is almost anything.


Yes. My proposal is to not give the agent Bash, because it is not required for the sorts of things you want it to be able to do. You can whitelist specific actions, like git commits and file writes within a specific directory. If the LLM proposes to read a URL, that doesn't require arbitrary code; it requires a system that can validate the URL, construct a `curl` etc. command itself, and pipe data to the LLM.


> whitelist specific actions

> file writes

> construct a `curl`

I am not a security researcher, but this combination does not align with "safe" to me.

More practically, if you are using a coding agent, you explicitly want it to be able to write new code and execute that code (how else can it iterate?). So even if you block Bash, you still need to give it access to a language runtime, and that language runtime can do ~everything Bash can do. Piping data to and from the LLM, without a runtime, is a totally different, and much limited, way of using LLMs to write code.


> write new code and execute that code (how else can it iterate?)

Yeah, this is the point where I'd want to keep a human in the loop. Because you'd do that if you were pair programming with a human on the same computer, right?


No?

When I have paired, normally the other person can e.g. run the app without getting my review & signoff. Because the other person also is a programmer, (typically) working on their computer.

The overall result will be the product of two minds, but I have never seen a pairing session where the driver waits for permission to run code.


It is very much required for the sorts of things I want to do. In any case, if you deny the agent the bash tool, it will just write a Python script to do what it wanted instead.


Go for it. They have allow and deny lists.


That's a great deal of work to get an agent that's a whole lot less capable.

Much better to allow full Bash but run in a sandbox that controls file and network access.


Agents know that.

> ReadFile ../other-project/thing

> Oh, I'm jailed by default and can't read other-project. I'll cat what I want instead

> !cat ../other-project/thing

It's surreal how often they ask you to run a command they could easily run, and how often they run into their own guardrails and circumvent them


Tools may become dangerous due to a combination of flags. `ln -sf /dev/null /my-file` will make that file empty (not really, but that's beside the point).


Yes. My proposal is that the part of the system that actually executes the command, instead of trying to parse the LLM's proposed command and validate/quote/escape/etc. it, should expose an API that only includes safe actions. The LLM says "I want to create a symbolic link from foo to bar" and the agent ensures that both ends of that are on the accept list and then writes the command itself. The LLM says "I want to run this cryptic Bash command" and the agent says "sorry, I have no idea what you mean, what's Bash?".


That's a distinction without a difference, in the end you still have an arbitrary bash command that you have to validate.

And it is simply easier to whitelist directories than individual commands. Unix utilities weren't created with fine-grained capabilities and permissions in mind. Wherever you add a new script or utility to a whitelist, you have to actively think whether any new combination may lead to privileges escalation or unintended effects.


> That's a distinction without a difference, in the end you still have an arbitrary bash command that you have to validate.

No, you don't. You have a command generated by auditable, conventional code (in the agent wrapper) rather than by a neural network.


That command will have to take some input from neural network though? And we're back in Bobby Tables scenario


No, that argument makes no sense. SQL injection doesn't happen because of where the input comes from; it happens because of how the input is handled. We can avoid Bobby Tables scenarios while receiving input that influences SQL queries from humans, never mind neural networks. We do it by controlling the system that transforms the input into a query (e.g. by using properly parameterized queries).


Right, in DBs it's proper param binding + prepared statements.

I see what you're saying, makes sense.

FWIW there is (in analytics) also RBAC layer, like "BI tool acting on behalf of user X shall never make edits to tables Y and Z"


Because the OS already provides data security and redundancy features. Why reimplement?

Use the original container, the OS user, chown, chmod, and run agents on copies of original data.


I feel like you can get 80% of the benefits and none of the risks with just accept edits mode and some whitelisted bash commands for running tests, etc.


This is functionally equivalent to auto approving all bash commands, unless you prevent those tests from shelling put to bash.


Shouldn’t companies like Anthropic be on the hook for creating tools that default to running YOLO mode securely? Why is it up to 3rd parties to add safety to their products?


> Because we've judged it to be worth it!

Famous last words


People really really want to juggle chainsaws, so have to keep coming up with thicker and thicker gloves.


The alternative is dropping them and then doing less work, earning less money and having less fun. So yes, we will find a way.


Or just holding the tool the way it’s meant to be held :)

I’ll stop torturing the analogy now, but what I mean by that is that you can use the tools productively and safely. The insistence on running everything as the same user seems unnecessary. It’s like an X-Y problem.

Really this is on the tool makers (looking at you Anthropic) not prioritizing security by default so the users can just use the tools without getting burned and without losing velocity.


Just like every package manager already does? This issue predates LLMs and people have never cared enough to pressure dev tooling into caring. LLMs have seemingly created a world where people are finally trying to solve the long existing "oh shit there's code execution everywhere in my dev environment where I have insane levels of access to prod etc" problem.


Way back when I was young and broke, I played through Half Life 2 and the episodes on a ThinkPad T420 using an ExpressCard/34 PCMCIA to PCI with a graphics card I borrowed and an old crappy PSU I pulled from a business Dell desktop.

Managed to complete the games with decent graphics and framerate at the time. It wasn't an ideal setup, but I didn't care. In fact, I thought it was a cool hack to play games at the time without forking out a lot of money to build a gaming PC.

Maybe there are probably better options now to game than attaching a dedicated GPU with whatever hardware you already have, but I can verify that external GPUs are really cool and useful (though a 5090 is definitely not needed). You also don't have to care about cooling the GPU, since it's "atmosphere" cooled (though headphones and/or ANC are a must).


I tried a similar setup when in college, albeit with a X230 and a 1050Ti, and it worked amazing... for a few minutes at a time, since it blue-screened often.

I never managed to figure out the issue. The BSOD was something about a gpu timeout. It worked perfectly at home but shat the bed at the dorm. I assume there was some nasty interference there.


I must be out of the loop, I didn't know people were actually doing this in their workflow. When I do use LLMs, it's in a separate app, where I can cherry pick what I input and output at my own pace.

Maybe I'm naive, but the ever-increasing tradeoffs for even more velocity does not seem worth it.


Don’t worry, the only people that are doing this are creating absolute dumpster fires.


same. I think this is the way for most of good engineers


People underestimate how much our socially and culturally constructed gender roles impact interests and/or career paths. People have different tolerances with respect to conformity, and at different stages in their lives.

It's a shame something as fundamental as computing is seen as a "boy" thing by many, often fatalistically, and I think we've been worse off for it.


Boys who code seem to be more territorial about their craft and code and choices and content of teams. As a female who codes, I love the craft and making innovative work. Yet waaaay too many times I’ve encountered people who get severely attached to their own approach about something and religiously force others to subdue to their ways. To the point of bullying other teammates about things that don’t really matter. I wonder if that kind of culture has alienated women more than men.


How do you explain that more gender equal countries have less girls in STEM?

https://en.wikipedia.org/wiki/Gender-equality_paradox


>People underestimate how much our socially and culturally constructed gender roles impact interests and/or career paths.

I mean, if you read the OP, it basically presents a bunch of evidence against this position. (Specifically, if it were a matter of social construction, it wouldn't be so easy to find lots of computer ads featuring girls and women.)


Pretty cool! You can force a miss by setting these vars in the console

  scoreMinute += 1
Or

  forceMissPaddle = rightPaddle; // or leftPaddle


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: