Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

i've seen photos of the bsod from an affected machine, the error code is `PAGE_FAULT_IN_NONPAGED_AREA`. here's some helpful takeaways from this incident:

1) mistakes in kernel-level drivers can and will crash the entire os

2) do not write kernel-level drivers

3) do not write kernel-level drivers

4) do not write kernel-level drivers

5) if you really need a kernel-level driver, do not write it in a memory unsafe language




I've said this elsewhere but the enabling of instant auto-updates on software relied on by a mission critical system is a much bigger problem than kernel drivers.

Just imagine that there's a proprietary firewall that everyone uses on their production servers. No kernel-level drivers necessary. A broken update causes the firewall to blindly reject any kind of incoming or outgoing request.

Easier to rollback because the system didn't break? Not really, you can't even get into the system anymore without physical access. The chaos would be just as bad.

A firewall is an easy example, but it can be any kind of application. A broken update can effectively bring the system down.


There sure are a lot of mission-critical systems and companies hit by this. I am surprised that auto-updates are enabled. I read about some large companies/services in my country being affected, but also a few which are unaffected. Maybe they have hired a good IT provider.


I'm not surprised, seeing how this madness has even infected OSS/Linux.

https://github.com/canonical/microk8s/issues/1022

A k8s variety. By Canonical. Screams production, no one is using this for their gaming PC. Comes with.. auto-updates enabled through snap.

Yup, that once broke prod at a company I worked at.

Should our DevOps guy have prevented this? I guess so, though I don't blame him. It was a tiny company and he did a good job given his salary, much better than similar companies here. The blame goes to Canonical - if you make this the default it better come with a giant, unskippable warning sign during setup and on boot.


Snap auto update pissed me off so much I started Nix-ifyng my entire workflow.

Declarative, immutable configurations for the win...


One thing to consider with security software, though, is that time is of essence when it comes to getting protection again 0day vulnerabilities.

Gotta think that the pendulum might swing into the other direction now and enterprises will value gradual, canary deployments over instant 100% coverage.


I'm not a Windows programmer so the exact meaning of PAGE_FAULT_IN_NONPAGED_AREA is not clear to me. I am familiar with UNIX style terminology here.

Is this just a regular "dereferencing a bad pointer", what would be a "segmentation violation" (SEGV) on UNIX, a pointer that falls outside the mapped virtual address space?

As this is in ring 0 and potentially has direct access to raw, non-virtual physical addressing, is there a distinction between "paged memory" (virtual address space) and "nonpaged memory" (physical address) with this error?

Is it possible to have a page fault failure in a paged area (PAGE_FAULT_IN_PAGED_AREA?), or would that be non-fatal and would be like "minor page fault" (writing to a shared page, COW) or "major page fault" (having to hit disk/swap to bring the page into physical memory)?

Are there other PAGE_FAULT_ errors on Windows?

Searching for this is difficult, as all the results are for random spammy user-centric tech sites with "how do I solve PAGE_FAULT_IN_PAGED_AREA blue screen?" content, not for a programmer audience.




Basically all AV either runs as root or uses a kernel driver. I guess the former is preferable


Rust's memory safety does not prevent category errors like using nonpaged memory for things supposed to be paged and vice versa


this all-or-nothing mindset is is reductive and defeatist—harm reduction is valuable. sure, rust won’t magically make your kernel driver bug free, but will reduce the surface area for bugs, which will likely make it more stable.


Yes, I fully agree.

Unfortunately, we have decades of first Haskell pseudo-fans, a sidequest of generic "static typing (don't look at how weak the type system is)" pseudo-fans, and now Rust afficionados that do act like it's all-or-nothing and types will magically fix things including category and logic errors.

At some point tiredness and reactivity steeps in.


Other takeaways:

- do not put critical infrastructure online

- do not push updates that work around the update schedule

- do not push such updates to all machines at once

- do not skip testing and QA, relevant to the number and kind of the machines affected

Even one of these would have massively improved the situation, even with a kernel-level driver written in an unsafe language.


Memory safe language does not prevent crash.

In case of potential UB (and then memory corruption), you get a guaranteed crash.

Wait, crash? :wink:


did you have a crowdstroke while writing this reply?


The problem is that some viruses may run in the kernel mode, so an AV has to do the same, or it will be powerless against such viruses.


If a virus got that far, you're already in trouble. What stops them from attacking the anti-virus?


If you think AV cannot stop viruses in the same privilege level, then that is more reason for AV to run in the kernel mode. Because by your logic, an AV in user mode cannot stop a virus in user mode.


>5) if you really need a kernel-level driver, do not write it in a memory unsafe language

I C what you're doing... >_>


pointing out the obvious? why are you upset i’m stating mixing hot oil and water will make a mess?


0) don't load a new driver into your working kernel.


an audio driver once blue screen of death'd my windows whenever i started Discord.

i'm surprised i'm not hearing a stronger call for microkernels yet


5) Well how much of those kernel-level drivers we rely upon ARE written in a memory unsafe language ??? Like 99% ?

And we are not crashing and dying every day?

Sure, Rust is the way to go. it just took Rust 18 years to mature to that level.

Also, quite frankly, if your unwrap() makes your program terminate because an array out of bounds isn't that exactly the same thing ? (program terminates)

But IMHO if we are hopping along a minefield at this moment every second of every day, well... If this is the worst case scenario, yeah it's not that worse after all.


> Well how much of those kernel-level drivers we rely upon ARE written in a memory unsafe language ??? Like 99% ? And we are not crashing and dying every day?

we shouldn't discount the consequences of memory safety vulnerabilities just because flights haven't physically been grounded.

> Also, quite frankly, if your unwrap() makes your program terminate because an array out of bounds isn't that exactly the same thing ? (program terminates)

this is a strawman, if you were writing a kernel-level driver in rust you'd configure the linter to deny code which can cause panics.

here's a subset:

- https://rust-lang.github.io/rust-clippy/master/index.html#/u...

- https://rust-lang.github.io/rust-clippy/master/index.html#in...


Not a helpful takeaway, I've yet to see a Java kernel driver.


Nobody is telling you to use Java. Although, if you want to revive Singularity that would be pretty neat.


And I never said that anyone is telling me to use Java. It was an example.

Because of the nature of AV software, its code would be drowning in "unsafe" memory accesses no matter the language we chose. This is AV, it's always trying to read the memory that is not AV's, from its very design.

This is a story about bad software management processes, not programming languages.


Reading memory from another process can be done through memory-safe APIs.

To give an example from the linux userspace world: https://docs.rust-embedded.org/rust-sysfs-gpio/nix/sys/uio/f...


be the change you wish to see




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: