Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems baffling that Microsoft is getting heat for this. They didn't cause the issue, a third party vendor's software did. Even if you were trying to make an argument of "if we had more diversity it wouldn't be as bad", shouldn't you be focusing on the EDR vendors rather than the OS vendor?


There is a frustrating amount of nuance being lost in this discussion, and as usual it's devolving into tribalism.

However, I'll say that, while this clearly is not Microsoft's fault, the realization of just how much critical infrastructure is running on Windows -- let alone Windows that's connected to the internet and has automatic updates enabled -- was sobering.

Are kernel mode drives maybe a bad idea? Yes! Should Windows be able to automatically roll back if a kernel mode driver fails? Yes! Should CrowdStrike have tested before pushing out the update? Yes! Should hospitals, police stations, airlines, etc be testing any and all updates that come their way prior to releasing them onto the rest of their fleets? Yes!

And so on and so on. The failures here are legion. I can at least say for myself that all of that (and more) is what I'm grappling with today, rather than the direct, root cause and content of the issue itself.

I say this entirely with as little tribalism or bias as I can muster: Windows should not be the foundation upon which critical infrastructure is built. It is a bad OS. It is a rickety, insecure mess; this observation (note that I did not say opinion) has only ever increased with time.

The NT kernel, I hear, is great. I generally trust those who say this, as they tend to be demonstrably much smarter than I am, especially with regards to kernel design, which is something I know next to nothing about.

But Windows? Nah man. Maybe there's a good kernel running the show, but the OS as a whole is a shitshow. Yes, this exact same problem could have (and has) happened on Linux; the OS is indeed not the culprit. But the understanding that this one, single, no good very bad OS is responsible for this much critical infrastructure going from an abstract worry to a real, concrete horror show has, I think, hit a lot of people, leading to a poorly-pivoted change of subject.


Totally agree here. There should be a mechanism to go back to ‘last known good’ regardless of kernel level issues. Innovations like Fedora Silverblue with ostree and greenboot tech should be adopted by Windows.


>There should be a mechanism to go back to ‘last known good’ regardless of kernel level issues. Innovations like Fedora Silverblue with ostree and greenboot tech should be adopted by Windows.

Can you explain how they work? AFAIK the issue is that they pushed a bad config file, and that's the thing that caused the crash, not a new driver. Are those systems going to roll back every file ever to try to recover themselves?


> Are those systems going to roll back every file ever to try to recover themselves?

Isn't that totally feasible with things like Btrfs snapshots?


It is, assuming your disk driver or related drivers aren't the ones failing.

Then again, nothing is preventing an update loop here.


>There is a frustrating amount of nuance being lost in this discussion, and as usual it's devolving into tribalism.

It's ironic that you're saying this given the points you're making below. Let's go through them:

>Are kernel mode drives maybe a bad idea? Yes!

You can't have a EDR product that isn't kernel mode. Otherwise it's trivial for malware to evade (eg. by being kernel mode themselves).

>Should Windows be able to automatically roll back if a kernel mode driver fails? Yes!

see: https://news.ycombinator.com/item?id=41019743

>Should hospitals, police stations, airlines, etc be testing any and all updates that come their way prior to releasing them onto the rest of their fleets? Yes!

Getting companies to test updates is already like pulling teeth. Besides, crowdstrike said the update they pushed was "designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks". Is this something you really want to sit on for testing, which might take weeks or months?

>Windows should not be the foundation upon which critical infrastructure is built. It is a bad OS. It is a rickety, insecure mess; this observation (note that I did not say opinion) has only ever increased with time.

>The NT kernel, I hear, is great. I generally trust those who say this, as they tend to be demonstrably much smarter than I am, especially with regards to kernel design, which is something I know next to nothing about.

>But Windows? Nah man. Maybe there's a good kernel running the show, but the OS as a whole is a shitshow.

What specific security issues do you think windows/NT kernel has? Moreover, how is windows being a "shitshow" relevant to the question of resiliency or dependence? Don't get me wrong, windows spying on you or using dark patterns to get you to use Edge or whatever isn't great, but it's a weird thing to bring up in a discussion about how airports run on windows, and reeks of "don't let a disaster go to waste" on the part of the anti-windows tribe.


Mac doesn't allow 3rd part kernel drivers and on Linux they use ebpf. Is their product useless there?


>Is their product useless there?

Probably? For instance I doubt an EDR product can detect malware being executed on iOS/Android, because all the apps there are heavily sandboxed and provide no mechanism to do invasive monitoring of everything's that's being run.

>Linux they use ebpf

According to wikipedia it's been ported to windows and on linux you can still load kernel modules which are crashable.


> and on Linux

On Linux, they apparently had done the same thing and caused a bunch of Linux systems to crash[1].

[1]: https://www.neowin.net/news/crowdstrike-broke-debian-and-roc...


> It's ironic that you're saying this given the points you're making below.

The nuance I was referring to was with regards to the actual facts of the situation, such as who is responsible, and that many, many people are just using it as an excuse to dunk on Windows. As I said, Windows/Microsoft are not at fault in this precise situation.

However, is it possible that I can state that a thing is bad without it being "tribalism?"

I'm invoking tribalism and setting myself apart from it in an attempt to make it very clear that my criticism of Windows is not simply tribalism. Stating that Boeing airplanes are prone to critical faults due to bad engineering is a fact, not tribalism; that is still true even if I personally dislike Boeing airplanes/the company. Perhaps I'm even critical of them precisely because of the bad engineering I've observed! Weird how that works.

The same can be said for Windows.

> Getting companies to test updates is already like pulling teeth. Besides, crowdstrike said the update they pushed was "designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks". Is this something you really want to sit on for testing, which might take weeks or months?

gestures broadly I mean...given the current situation, obviously I'm going to answer with an emphatic "yes!" You know we have these things called computers, right? They're really good at automating stuff. Like testing.

I know that "getting companies to test updates is like pulling teeth," but that doesn't mean it shouldn't be done. Companies do all sorts of stupid and negligent bullshit, are happy to spend money in the name of shifting blame, but are cheap as hell with regards to actually avoiding problems. That's not a good thing, and it should change. Is that really such a controversial statement? Apologies for potentially engaging in strawmanning, but in the even that your response is something along the lines of "they should test, but it's not realistic to expect that," yeah, I agree, but perhaps this precise event is the kick in the pants those who are against testing need to stop being cheap morons. And in case you're not clean on who I'm referring to: the decision makers at the top, not the engineers caring out their irresponsible and negligent agendas.

I was in the ER quite literally the day before this hit with a slash to my popliteal artery (long story, freak accident), and I shudder to think how it would have gone a day later -- I honestly could have bled out and died. The fact that so many places running absolutely critical infrastructure aren't routinely testing every change they push out to their devices is insane. Utterly, bat shit insane.

> What specific security issues do you think windows/NT kernel has?

Sorry, not taking the bait on this one. Windows is a piece of shit, and it's absolutely self-evident to anyone even kind of exposed to the alternatives. Expanding on this to you is a waste of time, as either you've never used Windows before (highly unlikely) or you're unwilling to see what I'm talking about for whatever reason.

> Moreover, how is windows being a "shitshow" relevant to the question of resiliency or dependence?

Do I really have to explain to you why a shitshow of an OS isn't resilient?

I've tried to make it extremely clear that 1) I don't blame Windows or Microsoft in this incident, but 2) this incident revealed just how much critical infrastructure relies on an OS that has no business being used as such. That's not "don't let a disaster go to waste," it's one disaster revealing a situation that is ripe for many, many more. I'm not "anti-Windows," I'm "anti-Windows-as-a-server" and "anti-horrible-system-administration-practices."


>The nuance I was referring to was with regards to the actual facts of the situation, such as who is responsible, and that many, many people are just using it as an excuse to dunk on Windows.

Right, and I'm pointing out the irony one level down, with some of your takes (ie. not the facts of the situation, but the suggestions that you're making).

>gestures broadly I mean...given the current situation, obviously I'm going to answer with an emphatic "yes!" You know we have these things called computers, right? They're really good at automating stuff. Like testing.

I don't think anyone thinks testing wouldn't have prevented this disaster, nor that testing is bad. The question is whether holding back updates is actually better overall in practice. Remember the Equifax hack? Turned out it was caused by them using a vulnerable version of Apache Struts, which they didn't update for months/years. Now, should they also theoretically have been doing engineering best practices and having a testing pipeline that would allow them to update library versions with minimal fuss? Yes, but in practice that's not something that can be done. The same applies to EDR updates. Should end users' IT departments have test suites so that they can test and release updates within hours of them being released? Yes. Is that a realistic option that actually exists? No.

>> Moreover, how is windows being a "shitshow" relevant to the question of resiliency or dependence?

>Sorry, not taking the bait on this one. Windows is a piece of shit, and it's absolutely self-evident to anyone even kind of exposed to the alternatives. Expanding on this to you is a waste of time, as either you've never used Windows before (highly unlikely) or you're unwilling to see what I'm talking about for whatever reason.

Clearly you don't have a context window exceeding one sentence, because the two sentences immediately following is critical to the understanding of that sentence. If you read those, you'd even see listed out common reasons why people think windows is bad.

>I've tried to make it extremely clear that 1) I don't blame Windows or Microsoft in this incident, but 2) this incident revealed just how much critical infrastructure relies on an OS that has no business being used as such. That's not "don't let a disaster go to waste," it's one disaster revealing a situation that is ripe for many, many more. I'm not "anti-Windows," I'm "anti-Windows-as-a-server" and "anti-horrible-system-administration-practices."

Sounds like you're already convinced that windows is bad, and the only new thing you got out of this is that a lot of important systems run on windows?


But as you've outlined multiple times, the flaws of the OS are not the problem here. Changing OS, the same poor decisions can and have been made. Windows can be locked down or configured to a very stable level, and Linux can be configured to a shitshow.

I don't like Windows, but embedded edition has been the most common operating system you encounter in the physical world for decades, and broadly it works so well you don't know. I don't see a good argument that something on the Windows end needs fixing here.


Windows specifically does not allow automated rollback in case of boot drivers because:

a) that generally does not work, because the driver is required to boot into recovery environment

b) even if it did, for security like this, downgrade attacks are a consideration

and most importantly

c) in this case, Windows does not have a full backup of previous external configuration because this was not a driver update


I'm curious what people think, but while obviously CrowdStrike caused the breakage, does the Operating System not have some responsibility in not allowing such outages to happen? Especially if it's an enterprise product?

Ideas:

1. Microsoft themselves could potentially enforce a gradual rollout on updates (did the update go through windows updates?)

2. Have better automatic recovery options, could windows have detected the bad driver and reverted it automatically?

3. just generally be more resilient, why should one bad driver take everything down?


>1. Microsoft themselves could potentially enforce a gradual rollout on updates (did the update go through windows updates?)

No. The crash was caused by a "configuration update", not through code pushed through windows update.

https://www.crowdstrike.com/blog/falcon-update-for-windows-h...

>2. Have better automatic recovery options, could windows have detected the bad driver and reverted it automatically?

see: https://news.ycombinator.com/item?id=41019743

>3. just generally be more resilient, why should one bad driver take everything down?

It's kernel mode. With great power comes great responsibility. Blaming the OS is like blaming linux that you can sudo rm -rf /.


I'd expect the HN crowd to not give Microsoft beef here or want them to lock down the OS more. At least, that would be consistent with usual comments about letting users control their OS. Companies chose (with some regulatory pressure) to install CrowdStrike at the kernel level after all.


I didn't talk about locking down, but just generally building more resilience/recovery into the OS.

Also, from the little I saw, CrowdStrike got around certifying every driver update by pushing updates through "update" files that are read and executed by the driver. That seems like a huge hack to bypass certification and likely shouldn't be allowed?


> does the Operating System not have some responsibility in not allowing such outages to happen?

No. External drivers are acting as part of the operating system, their responsibility is to follow the rules for kernel drivers. Don't use-after-free or dereference otherwise bad memory (as appears to be the case here[1]), make sure you only access pageable memory at the correct IRQL, etc etc.

The kernel's job is to provide services and to make sure the other components are running smoothly. The problem is, by the time that something bad like a bad dereference has occurred, other issues may start to arise. And unloading a driver or something may cause data loss and may not actually fix the underlying problem (especially if you pin the error on the wrong driver[2]).

If third party software does not follow the contract, there's... really not much they can or really should do. In user mode, an access violation is given to the program when you access bad memory. This usually results in a process crash, which while annoying, may be fine. User mode programs can't[3] bring down the operating system.

In kernel mode, there's no way for the OS to know that you're not going to start overwriting the disk accidentally so they made what they believe to be the safest choice--stop[4].

In any case, there's really no way for 1 to happen (since driver updates can be done externally to Microsoft), 2 is nebulous, and 3 is potentially dangerous.

---

[1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/d...

[2]: For example, in the case of stack corruption.

[3]: Technically they can in a couple of limited cases (but this really isn't the point). The first being killing CSRSS or another process with the "critical" kernel flag, and another by using NtShutdownSystem from the NT API.

[4]: Other operating systems have taken a different philosophy. Notably Linux can be configured to allow the machine to run after a driver or other external event causes a kernel oops.


> does the Operating System not have some responsibility in not allowing such outages to happen?

No. The OS is supposed to guarantee that userspace programs can't crash the system like this, but CrowdStrike is an invasive kernelspace driver, not a userspace program.


Phone OS's are counter examples. Their app isolation is so strong they don't need anti virus software. Effectively all a virus gets to access to without explicit user permission is itself and the user. In particular it doesn't get to screw with the OS itself, nor with the features the OS reserves for the user like installing and uninstalling, and turning permissions on and off.

Granted, just giving the virus access to the user creates problems. While the virus can't directly change it's permissions, it can socially manipulate the user into doing it for them. But even so it's fixable in the sense it doesn't take a wipe and install and the user can always just uninstall the virus.

So yes, the OS can prevent this problem entire by implementing strong access controls. The problem isn't that it can't be done. The problem is that Windows doesn't do it (although the appear to be moving in that direction). Neither do all Desktop Linux's I'm familiar with, nor does macos.

I think it's important to distinguish between the desktop and the kernel, because Android and ChromeOS are build with Linux and they enforce an access control system that is secure by default. I'm sure a secure OS could be built on the WinNT kernel too, but Microsoft only gives you one and it's insecure by default.


They have tried some experiments in this area, like CLR drivers and altogether different kernel.

Nothing stuck for a variety of reasons...

Reverting security to an insecure state is called a downgrade attack. Can't allow that.

What could be done is a better Safe Mode with Networking that would allow for secure remoting over an AD enterprise configuration... And the machines configured to automatically enter that mode. Still bit of a security issue potential.


What responsibility does the Linux bear if you write your own kernel module, install it, and it bricks your machine until you boot into a different OS?

There are limits to the sensible responsibility of the operating system vendor / maintainer. They stopped somewhere south of "mutating core configuration of the OS itself," because if they don't, the owner doesn't control their own computer.


Is it baffling? Only windows computers were affected, so this provides ammunition for people who are already concerned about the prevalence of windows.

I think this is more politically motivated than actually trying to address the real problem.

However I do, think Microsoft deserves some blame, since if Windows was more secure by default there would be less need for 3rd party anti-malware software that can fail so catastrophically. But, Crowdstrike certainly is more to blame for thos particular disaster.


>Only windows computers were affected

Because only Crowdstrike's Windows release was broken by them and they didn't fuck it up on the other OS. How is this Window's fault? It's not like that tool was binary cross platform compatible for all operating systems, like Electron VS Code, in order to put the blame on the OS. It's basically a complete different tool tailored to each OS kernel, under the same brand name.

> if Windows was more secure by default there would be less need for 3rd party anti-malware software that can fail so catastrophicall

Which Windows security vulnerability needed the use of Crowsdstrike's tool? How do you measure this security threshold at which you say an OS is so secure that you don't need a third party EDR vendor anymore? You see, here's the problem. That threshold does not exist, since security it's a continuous arms race, and EDR vendors are incentivize to scaremonger you into thinking Windows or whatever OS you use is so insecure and your employees so incompetent and untrustworthy with what they do on their computers that your multi billion dollar won't be able to survive if you don't buy their tool.

And that's partly true to some extents. You can have the most secure OS in the world, but if Bob from accounting runs a script he found online as root then all that OS security was for nothing since the OS can't magically detect that Bob is an idiot and should not run that command as sudo even though Bob has the right credentials for it. That's why companies hire these digital watchmen, to watch over their systems for suspicios behavior since neither Windows, nor MacOS, nor Linux can do this out of the box. The question remains who watches the watchmen? You're now trusting an EDR vendor with the keys to your kingdom instead of Bob.


> Because only Crowdstrike's Windows release was broken by them and they didn't fuck it up on the other OS. How is this Window's fault?

First of all, whether it is actually Window's fault or not isn't the point. What matters is the perception by the general public and policy makers. And maybe it could have just as easily happened on another OS, but the reason why is somewhat technical, so people who have an agenda can spin it as being MS's fault.

Secondly, it is possible, although I don't think terribly probable, that the bug was influenced by some quirk of the windows platform, although it is impossible to know without more details of the exact nature of the bug.


>> Because only Crowdstrike's Windows release was broken by them and they didn't fuck it up on the other OS. How is this Window's fault?

>First of all, whether it is actually Window's fault or not isn't the point. What matters is the perception by the general public and policy makers

Way to move the goalposts from "How is this Window's fault" to "the perception by the general public and policy makers". I don't think anyone is disputing that some people are blaming Microsoft for this. In fact the whole impetus for this comment chain is me pointing out how the public perception of microsoft being at fault doesn't match up with logic.


I'm sorry if my original comment wasn't clear enough. My original intent was not to say that it was Microsofts fault. Rather that it isn't surprising that some people with limited knowledge and/or understanding of the relevant details might blame Microsoft, and that people with an axe to grind on Microsoft would take advantage of that potential misunderstanding.


>However I do, think Microsoft deserves some blame, since if Windows was more secure by default there would be less need for 3rd party anti-malware software that can fail so catastrophically.

There's actually a first-party product: https://learn.microsoft.com/en-us/defender-xdr/microsoft-365...


Microsoft are signing kernel drivers for over-the-air updates pushed by vendors.

Whilst telling their corporate customers that using Microsoft InTune will allow them to be in full control of their configuration management.


>Microsoft are signing kernel drivers for over-the-air updates pushed by vendors.

The crash was caused by a configuration update pushed by crowdstrike, not a new driver.

>Whilst telling their corporate customers that using Microsoft InTune will allow them to be in full control of their configuration management.

How is this relevant? This wasn't caused by some sysadmin that goofed a config change using intune, it's something pushed by crowdstrike itself.


> The crash was caused by a configuration update pushed by crowdstrike, not a new driver.

Microsoft didn’t do enough due diligence on the behaviour of CrowdStrike updates. It shouldn’t allow out-of-band updates.

Third-parties should not be signing code that allows third-parties to reach into corporate services and push files.

Microsoft InTune is the mechanism that all configuration updates should use.

It appears to me that Microsoft makes it harder for third-parties to update an Xbox game, than the configuration of a kernel driver.


>Microsoft didn’t do enough due diligence on the behaviour of CrowdStrike updates. It shouldn’t allow out-of-band updates.

You want Microsoft to be doing code reviews of third party software? That might have prevented this disaster but would get them in hot waters for other reasons (eg. anti-competition accusations). Not even Apple gatekeeps that hard.

>Third-parties should not be signing code that allows third-parties to reach into corporate services and push files.

So dropbox should be banned as well? It's also a third party service that pushes files onto computers.

>Microsoft InTune is the mechanism that all configuration updates should use.

Okay, so crowdstrike pushes its virus definition updates via intune instead. The bad file still lands on computers, the kernel mode driver still crashes before the computer can boot and the computer is bricked. How is this any better?


We’re talking about kernel drivers here.

I think they warrant some extra attention by Microsoft.


Why are you arguing about the responsibility of MS or CS when the owner of these enterprise computers is clearly the one responsible for what's on them? The companies affected are the ones who chose to install the software and run it with the inherit risks.

> Microsoft InTune is the mechanism that all configuration updates should use.

That would make it one of the most locked down operating systems in the world, killing millions of applications overnight. Enterprise IT already have the ability to control installs, updates and patches, why move that burden to Microsoft?


Basically more iOS, less PC.


So normies in governments are finally seeing the issues, it’s for the wrong reason indeed, but is it though? It could have been an MS trip up and then it may have been worse even…


Microsoft personnel are usually more careful with existential risks... usually.


Part of Microsoft's problem is that right before the CrowdStrike outage, there was an outage of some large Microsoft services, making the two look connected. It didn't help that a lot of Microsoft support consultants also ran CrowdStrike laptops, getting affected right when customers were.

But this is quite typical. Microsoft generally doesn't direct blame to anyone (anyone non-malicious at least) but in this case I feel like they really should.


I go the other way, it's absolutely their fault. Their historical business model of monopolizing the enterprise while outsourcing security is at the root of all this. The same thing could happen with Crowdstrike on other platforms, but it's only on Windows where these suites are considered essential.


>but it's only on Windows where these suites are considered essential.

They're "considered essential" for all corporate machines for compliance and CYA reasons. If you're the CISO/CTO and a hack occurred on your watch, do you really want to be the one telling the board that you didn't need EDR on your linux machines because linux is secure?


It's a culture that has grown up around a dominant platform that had little or no security built in. Where I work, Windows is heavily locked down with all sorts of security junk on it, but iOS gets by with minimal MDM.


Besides cloud outages, there's also been quite a few security disasters recently, one with the Russians all up in Uncle Sam's business.

There's no reason for MS to have such a lock on governments, when their security and reliability is worst in the industry among BigTech. Even "Blue Hat" could do better.


is not it the case that kernel drivers are signed and approved by microsoft?


I'm not aware of Microsoft signing off on vendor kernal drivers code changes.

Driver signing is not a Quality Assurance measure. It's an identify measure.


More importantly, the code signing is a security measure. You can in theory vet the signature database by hand preventing the execution of unapproved drivers.

Microsoft normally puts it in revocation mode only, with bad drivers with unfixed holes being blocked.


I think it was essentially a data/config file that was corrupt not a new driver kernel. nothing really to be signed and approved by mircosoft.


Would the disaster have been prevented if there was no code signing? Or are you arguing that Microsoft should scrutinize code even harder?


Microsoft isn’t scrutinizing this code, at all, right? It’s not like they’re doing code reviews of every vendor that does code signing in Windows.


Not code reviews but the driver is tested

https://en.wikipedia.org/wiki/WHQL_Testing



Really? Is this the future of developers to just blindly trust a statistical text generator?

It's cool that you know how to ask a question on ChatGPT. But don't share it as if it's the truth on an obscure, halluncinable factoid.


>Is this the future of developers to just blindly trust a statistical text generator?

Unfortunately, yes.


why do you claim i doing it blindly?


As far as I can tell you just asked it a question then posted its response as if you expected it to be fact.

If that isn't blind trust, I don't know what is.


Added links to answer, they tell that drivers must be signed and tested by microsoft, the WHQL is part of that. Ability to do other way deprecated in 2021.


You are confusing notarized (what Apple does) and signed (what both do). The latter just allows to confirm legitimacy.


I guarantee you Apple will take zero responsibility for a failing or issue with third party software, notwithstanding the purported difference between notarized and signed.


Drivers yes, virus signature files or configuration files pushed by third parties, no.


Regulators who may not understand the difference between an OS and a driver.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: