How should corporate IT do it? You have 100,000 devices to manage. How do you ha...

danans · on July 19, 2024

> Swap Chromebooks for PCs and you still have the problem-- how do you handle centralized management of that "fleet"?

Simplicity (and hence low cost) of fleet management, OS boot-verification, no third-party kernel updates, and A/B partitions for OS updates are among the major selling points of Chromebooks.

It's a big reason they have become so ubiquitous in primary education, where there is such a limited budget that there's no way they could hire a security engineer.

EvanAnderson · on July 19, 2024

The OP was deriding monoculture. My point was that pushing out only Chromebooks is still perpetuating a monoculture. You're just shifting your risk over to Google instead of Crowdstrike / Microsoft.

re: Chromebooks themselves - The execution is really, really good. The need for legacy software compatibility limits their corporate penetration. I've done enough "power washes" to know that they're not foolproof, though.

danans · on July 20, 2024

I agree that monoculture is an issue that makes events like this more probable, regardless of OS.

That said, a third party being able to add/update a kernel driver ignores (even if out of business necessity) best practices for OS architecture.

EvanAnderson · on July 20, 2024

ChromeOS is just Linux, isn't it? It's going to suffer from the same problem as NT re: a buggy kernel mode driver tanking the entire OS.

Google gets a pass because their Customers are okay with devices with limited general purpose ability. Google is big enough that the market molds product offerings to the ChromeOS limitations. I think MSFT suffers from trying to please everybody whereas Google is okay with gaining market share by usurping the market norms over a period of years.

danans · on July 22, 2024

> ChromeOS is just Linux, isn't it? It's going to suffer from the same problem as NT re: a buggy kernel mode driver tanking the entire OS.

ChromeOS is not just Linux. It uses the Linux kernel and several subsystems (while eschewing others), but it also has a security and update model that prevents third parties (or even the user themselves) from updating kernel space code and the OS's user space code, so basically any code that ships with the OS.

Therefore, the particular way that the Crowdstrike failure happened can't happen on ChromeOS.

However, Google themselves could push a breaking change to ChromeOS. That, however would be no different than Apple or Microsoft doing the same with their OS's.

EvanAnderson · on July 22, 2024

> ChromeOS is not just Linux.

I am familiar with Google's walled garden w/ ChromeOS. I didn't mean to give the impression that I was not.

It's "just Linux" in the sense that it has the same Boolean kernel mode/user mode separation that NT has. ChromeOS doesn't take advantage of the other processor protection rings, for example. A bad kernel driver can crash ChromeOS just as easily as NT can be crashed.

Hopefully Google just doesn't push bad kernel drivers. Crowdstrike can't, of course, because of the walled garden. That also means you can't add a kernel driver for useful hardware, either. That limits the usefulness of ChromeOS devices for general purpose tasks.

danans · on July 22, 2024

> That also means you can't add a kernel driver for useful hardware, either. That limits the usefulness of ChromeOS devices for general purpose tasks.

It's target market isn't niche hardware but rather the plethora of use cases that use bog standard hardware, much like many of the use cases that CS broke a few days ago.

EvanAnderson · on July 22, 2024

Yes. I said that in a post up-thread. Google is making the market mold itself to their offering, rather than being like Microsoft and molding their offering to the market. Google is content to grow their market share that way.

Stranger43 · on July 19, 2024

If crowdsource QA department is all that stands between you and days of no operations then you chose to live with the near certainty that you will have days rather then hours of unplanned company wide downtime.

And if you cannot actually abandon someone like microsoft that consistantly screws up their QA then it's basically dishonest for you to claim that reliability is even a concern for your desktop platform.

And that's essentially what i say when i accuse the modern enterprise it's client device teams of being stuck in the 90ies as those risk were totally acceptable back when the stakes were low and outages only impacted non time critical back office clerical work. but what we saw today was that those high risk cost optimized systems got deployed into roles where the risk/consequence profile is entirely different.

So what you do is that you keep the low impact data entry clerks and spreadsheet wranglers on the windows platform but threat the customer facing workers dealing with time sensitive task something a bit less risky.

It's might not be as easy as just deploying the same old platform designed back in the 90ies to everyone but once you leave the Microsoft ecosystem dual sourcing based on open standards become totally feasible, at costs that might not be prohibitive as everything in the unix like ecosystem including web browsers have multiple independent implementations so you basically just have to standardize of 2-4 rather then one platform which again isnt unfeasible.

It's telling that an Azure region failed this news cycle without anyone noticing because companies just don't tolerate the kind of risk people takes with their wintel desktop for their backends so most critical services hosted in microsofts Iowa datacenter had and second site on standby.

jimnotgym · on July 19, 2024

>And if you cannot actually abandon someone like microsoft that consistantly screws up their QA

The last outage I can remember due to an ms update was 7 or 8 years ago. Desktops got stuck on 'update 100% complete'. After a couple of minutes I pressed ctrl+alt+del and it cleared. Before that...I don't remember. Btw MS provides excellent tools to manage updates, and you can apply them on a rolling basis.

EvanAnderson · on July 19, 2024

> If crowdsource QA department is all that stands between you and days of no operations ...

For companies of a certain large size, I guess. For all but the largest companies, though, there's no choice but to outsource software risks to software manufacturers. The idea that every company is going to shoulder the burden of maintaining their own software is ridiculous. Companies use off-the-shelf software because it makes good financial sense.

> And if you cannot actually abandon someone like microsoft that consistantly screws up their QA then it's basically dishonest for you to claim that reliability is even a concern for your desktop platform.

When a company has significant software assets tied to a Microsoft platform there's no alternative. A company is going to use the software that best-suits their needs. Platform is a consideration, however I've never seen it be the dominant consideration.

Today's issue isn't a Microsoft problem. The blame rests squarely on Crowdstrike and their inability to do QA. The culture of allowing "security software" to automatically update is bad, but Crowdstrike threw the lit match into that tinderbox by pushing out this update globally.

As another comment points out, Microsoft has good tools for rolling update releases for corporate environments. They're not perfect but they're not terrible either.

> It's might not be as easy as just deploying the same old platform ...

When a company doesn't control their software platform they don't have this choice. Off-the-shelf software is going to dictate this.

In some fantasy world where every application is web-based and legacy code is all gone maybe that's a possibility. I have yet to work in that environment. Companies aren't maintaining the "wintel desktop" because they want to.

Stranger43 · on July 19, 2024

Blaming crowdstikes QA might feel good but the problem is that no company in the history of the world have been good enough at QA for it not to be reckless to allow day one patching of critical systems, or for that matter to allow single vendor, single design, critical systems in the first place. and yet the cyber security guidelines required to allow the pretense that windows can be used securely all but demand that companies take that risk.

It's also fundamentally a problem of Danial, everyone knows there will not be an good solution to any issue around security and stability that does not require that the assets tied up inside fragile monopoly operated ecosystems to be eventually either extracted or written off but nobody want to blaze new trails.

Claiming powerlessness is just lazy yes it might take an decade to get out from under the yokel of an abusive vendor, we saw this with IBM, but as IBM is now an footnote in the history of computing it's pretty clear that it can be done once people start realizing there is an systematic problem and not just a serious of one-off mistakes.

And we know how to design reliable systems, it's just that doing so is completely incompatible with allowing any of America's Big IT Vendors to remain big and profitable, and thats scary to every institution involved in the current market.