I find it very impressive that the work actually made it into the Windows kernel. AFAIK the Linux kernel doesn't have formally verified parsers for any of the wire formats it deals with, this seems like an improvement on actually-in-use-outside-the-lab state of the art.
This is not new for MS. Windows 98 and XP would get a lot of BSOD because of third party drivers doing illegal things. MS have been investing heavily in formal verification since early 2000s because of that. My master’s adviser (and many more professors working in formal verification) would get most of their funding from MS. One early successful example I can think of is SLAM[1]
Linux will probably get some Rust parsers in the future, which are IMHO even better because they are easy to write so you're sure they are robust - whereas formally verifying a parser may or may not be done depending on the coder's available time and willingness to do so.
Yep if the translation is done by hand then that might be a potential problem. However a lot of modern proof tools have proven correct translators (based on the CompCert work).
I don't think you understand what "proven correct" means. If you formally prove a spec correct using modern proof tools then it is guaranteed to have zero bugs. Nobody has found even a single bug in the proven correct CompCert C compiler for example. However people find bugs in gcc all the time.
It means that a program has been verified against a model, and properties of that model are held true.
What it does not mean is that the program is bug free. That would, of course, violate the halting problem, or its generalization Rice's Theorem - non-trivial programs can not have arbitrary properties proven about them.
There's also Gödel's Incompleteness Theorem, obviously, as well as the fact that a proof is only as good as its model.
Not so. Consider the algorithm that reports "Yes" for all inputs for all program properties. For a large number of programs and properties, this algorithm will report the correct information.
Not true for non-Turing complete languages. Which is why all proof tools require a proof of termination. Some do it automatically (essential ruling out Turing complete languages) others require a formal proof of termination by the user.
f* (one system on display in this paper) can extract to c/assembler suitable for inclusion in a hand-written c source tree, all the while giving a wider and more interesting set of guarantees with the added benefit that you won't have to depend on a rust toolchain to build your project (bootstrapping f* is considerably easier, and you only have to do it when you check in your verified bits)
funny enough, much like rust, one of its most widely-deployed applications lives in firefox; see project-everest.github.io
If I understand this property it takes a specification and generates a formally proven parser that lets you cast the bytes to a struct or runs code to do something with the data.
All automagic, can’t see how it could get much simpler than that.
Haven't you heard? The only bugs are those caused by memory safety, and only switching to Rust can save us from the insanity of managing our own memory.
Windows' kernel is way ahead of Linux in many ways. It's a bit of a delusion that so many people think it hasn't advanced since Windows XP. Perhaps wilful denial.
It's going to take me a while to fully understand this paper. I wonder what the route to incorporating something like this in the Linux kernel would look like?
Generating readable C code was an important goal from early on. Parts of the linux kernel would have to be translated to Low*, which is the subset of F*, that can be extracted to C.
I guess there would be a new build step, that runs when code is checked in, to not burden everyone with all the new toolchain (F*, z3, karamel, maybe OCaml and more). While the generated C code can be read and understood by C programmers, changes to that part should be done in Low* and that may require changes to proofs in F* or to the extraction tool (karamel). That's a pretty big investment for a project like linux and I guess, any work in that direction will happen on forks for a while, before there is enough confidence that such changes are sustainable.
A good first candidate may be HACL* (https://hacl-star.github.io), a cryptographic library that is already used in several projects.
The end result is generated C code. So you could simply add the generated C code to the kernel like you would any other C code. And keep the proof chain separate from the kernel. My guess is that it is what Microsoft is doing with the kernel code.
My question was more around that it would have to replace existing code, so you'd need to advocate for it and come up with some way of incorporating it in the build. I'm not all that familiar with the development culture of the kernel but I imagine that large changes like this would have to be introduced slowly
> Software artifacts. The application of our toolchain to
Windows includes proprietary source code and is not publicly available. However, the 3D toolchain, including the EverParse libraries, the F programming language, Z3 SMT solver,
and the KaRaMeL C code generator are all open source and
developed publicly on GitHub. Documentation, code samples, and links to our latest tool releases are available from
https://project-everest.github.io/everparse.
The thing I'd like to understand that the paper didn't touch on: how do we know the example TCP RFC English Language description isn't logically flawed?
If the F* code adheres to the RFC, does the verification prove that the F* matches the 3D of the RFC, or does it find errors in the RFC itself?
On a side note, rather than debating re-writing the Linux kernel in Rust so that it is memory safe(er), why not start working on a formally hardend kernel code? Or does this only apply to logical propositions, e.g. the RFC?
> how do we know the example TCP RFC English Language description isn't logically flawed?
We don't, in the general case. In the specific case of TCP, the RFC has had many errata over the years[1] (many of which are logical errors).
This is a recurring problem in formal verification: you verify with respect to a particular description of the system or protocol, which itself may be either ambiguous or fail to reflect actual implementations. You can tighten that by verifying with respect to a formal definition, which can eliminate the ambiguity, but only pushes the trust as far as the proof generator and verifier themselves. You then maybe have multiple independent proof systems arrive at the same conclusion to strengthen your belief in the result, but at that point you're doing differential testing and not formal verification.
Yes. Looks like it's back to the 1960s, 70s, 80s, ... You found a parser that you've formally verified won't break. Great, I've heard that before.
The far bigger part of the problem is the spec - writing a formal spec is (for practical purposes) a similar process to writing a program, and similarly bug-prone. It's nice to have a clearly defined spec language, but that doesn't stop bugs/fault/problems and (by Gödel) can't stop a determined fool (fools are just too ingenious)!
People routinely say that "a formally verified program is only as good as its specification!" but many fewer people actually go on to demonstrate that they can break said programs, assuming that this is simply "a matter of engineering" which is not in fact the case.
I'm also not sure what Gödel has to do with any of this, as most specification languages are not Turing-complete (unless this is some oblique statement about the metalogic in which case... again, show me your proof of false :)).
I see four issues in its issue tracker with the "bug" label. Of these, one seems not to be reproducible on real hardware, one of them looks like a feature request, one of them is a bug in deliberately non-verified code (therefore doesn't say anything about formal verification), and the remaining one involves a hardware-level timeout due to bootloader code taking too long. If you believe this is the error rate of any non formally verified software, that these are exploitable from a security perspective, or that these are indicative of some serious failure in the specification process, you have a very different understanding of software than I do. i.e., I would imagine when people talk about "software is only as good as its specification" they are imagining some dramatic flaw that undermines the guarantees of the system, rather than "I can't use the last 4 KiB in the address space."
None of the "issues" in the first few pages look serious (most of them seem to be about portability to new architectures), so why don't we save some time and have you link to the specification bug you believe undermined the security guarantees of the system? So far you have only asserted such a bug exists.
Why are you talking about security guarantees? A bug is a bug, the point is to show that formal specifications don't prevent bugs. In some cases a warning text not showing up due to a bug or the wrong byte being sent to the serial port can have more severe consequences that any CVE-assignable issue.
Many people have found bugs in unverified parsing routines in the Linux kernel. All I'm asking is for someone to find a single one in one of Microsoft's new parsers. Surely, if specification bugs are just as commonplace as other kinds of programming issues, this shouldn't be too big of a request.
I'm mostly just tired of people pretending that formal verification produces buggy software at a rate that is in any way comparable to "ordinary" programming. Empirically, it doesn't, and people who claim it does have scant evidence to back up those claims. Are there specification bugs? Sure. Do they happen at the same rate that bugs in comparable unverified code (let alone C code) occur? Not even close.
Yeah, but you haven’t found bugs in the Linux kernel’s TCP parser. Your argument basically boils down to “X is true. If you think I’m wrong, then prove it by <extremely difficult task>”
(It turns out that verification does reduce bugs, but the refusal of folks here to audit parsers doesn’t prove anything either way)
However you can use formal methods to verify general properties about the specification, so you don't have to fully trust that the specification authors didn't make any mistakes at all. Apparently anyway; I've never done it.
That's what I was getting at with the second sentence in my post: we can verify the code matches the spec, but not that the spec is flawed.
So this verification is what CPU designers do: they create a RTL (HDL, VHDL, Verilog) and then a circuit (gates) and then Magma or Cadence or Synopsys tools formally verify that the logic in the behavioral model matches the implementation model.
However, if the microarchitect had a brain fart, the RTL is bugged to begin with.
I was hoping this could go backwards one step further and give the microarchitect/RFC-editor a swift kick in the pants. :)
Non-proven correct code doesn’t even have specs you can test the code against. So comparing non-proven code without a formal spec with code that has a formal spec and an end-to-end proof chain from spec to machine code, and then saying they are equally likely to be buggy is frankly ridiculous.
Advantage over what? The idea that RISC is somehow simpler went the way of the dinosaur 30 years ago. The ISA does not define performance, that is the microarchitecture, which is where all of the engineering effort is spent. You can have the classic ISA of the Turing Machine that we all learned about in comp-sci 101 classes: "Load/Store/Add/Compare/Jump" and it would be simple but dog-ass slow. Not to mention open to all sorts of attacks and exploits (buffer overflows, stack smashing), and meta attacks (side-channel), or Goedels incompleteness. The uarch is where the action happens.
We were talking about correctness. RISCs by definition have simpler architectures that CISCs - that's what makes them RISCs. And simpler architectures leave less room for confusion/bugs/faults in the architecture, irrespective how they are implemented.
> how do we know the example TCP RFC English Language description isn't logically flawed?
They'll surely have needed to translate the English specification into a formal specification. That act of translation isn't itself a formal process, so it's subject to errors, and also subject to dispute in case of ambiguities or underspecification or contradictions in the English text.
You can never formally prove that an informal natural language specification corresponds to a formal specification written in a formal language (although you could try to take another abstraction step by writing a formal grammar and formal semantics for the natural language and then claiming that the interpretation of the natural language text, under that interpretation, agrees with your formal specification... I don't believe this is common or likely to yield useful results unless the natural language text was written in a very controlled way with this kind of application in mind).
> On a side note, rather than debating re-writing the Linux kernel in Rust so that it is memory safe(er), why not start working on a formally hardend kernel code? Or does this only apply to logical propositions, e.g. the RFC?
Rust's type system can actually be viewed as a formal proof system, where the type checker is verifying proofs about behaviors of the program. But the proofs in question are somewhat weak compared to other formal methods techniques, which can potentially be used to prove stronger claims about program behavior.
The Rust ones aren't trivial, though; they rule out quite a few kinds of errors even without having a formal specification for the program's behavior beyond that given by the type annotations. And there are some other features inspired by formal methods, like the fact that match patterns must specify a behavior for every possible value of the expression:
Writing formally verified code with a richer specification tool is harder and slower going. Microsoft is paying the equivalent of an academic department (in MSR) to do some of that work. I would hope we would soon see similar stuff in the free software world from a regular academic department at a university... maybe inspired by some of this work from MSR!
Remember this is a set of parsers. So it's goal is to accept valid-according-to-spec data, and reject all other data, and do so in a safe way.
In that sense, the spec can only be logically flawed in a few ways
For example:
1. if it specifies impossible constraints on data it accepts (IE field1 < field2 && field2 > field1)
2. misses dependent constraints necessary to safely prove other things (field2 - field1 > 0 without specifying that field2 > field1)
etc
Assuming you specify the assertions in the code, it will find these errors because it will not be able to prove the arithmetic is sound.
To the degree it is possible for the spec to be logically flawed, but not cause underflow/overflow/etc, it will not detect it.
The work of turning a technical document into something automated provers can work on is often revealing when it comes to logical flaws which is itself a valuable product.
However, even when this appears to be successful and the proof tool verifies your design has the properties you wanted, there can be hidden assumptions in the proof which do not match the human understanding of the technical document.
TLS 1.3 was proved this way before the RFC was signed off. We know TLS 1.3 does what it says on the tin, assuming the constituent elements work as specified (e.g. AES is a working block cipher, SHA256 is a working cryptographic hash function, Elliptic Curve Diffie Hellman is a working key agreement protocol). However, it turns out they proved something slightly different from what humans would have understood from the document.
The result is the Selfie attack. NB This applies specifically to Pre-shared Key authentication, stuff like your web browser isn't affected. Here's how it goes: Alice and Bob are using a PSK to communicate securely, but their Cat would like to manipulate them. Based on their understanding of the pre-errata RFC document, Alice and Bob agreed a single Pre-shared Key, let's call that ABA, which the Cat doesn't know, and they use TLS 1.3.
0. Bob feeds the cat before leaving for work
1. Alice wakes up, notices the Cat's food bowl is empty and sends an encrypted message to Bob, Encrypt(ABA, "Did you feed the cat?")
2. The Cat intercepts Alice's encrypted message and sends it back to her. The Cat can't read the message, nor tamper with it, but just sends it unaltered and trusts that will have the desired effect.
3. Alice receives an authentic message encrypted with the ABA key, which she presumes is from Bob, it says "Did you feed the cat?"
4. Alice sends an encrypted reply, Encrypt(ABA, "No")
5. The Cat intercepts this message too and sends it back to Alice again
6. Alice receives the reply, which she presumes is from Bob, it says "No".
7. Alice feeds the cat again. The cat has now been fed twice. Attack successful.
How did this attack sneak part the proof? Well, the proof says Alice and Bob should agree two keys, an Alice->Bob key (let's calls this AB1) and a Bob->Alice keys (call it BA1). When Alice receives her own message encrypted with AB1, she can see she sent it, not Bob, and so she detects the Cat's attack. So the proof is fine.
The person writing the proof assumed this is what was meant in the human readable document describing TLS 1.3, but the other humans working from this document assumed it would be fine for Alice and Bob to agree a single key ABA. It's certainly not highlighted in the original document that this is a concern and you must agree two keys.
[ There are also lots of other things you could do, if you're careful and know about this attack, but for some reason can't just agree twice as many keys. However the correct basic design to explain in the document is to just make separate keys for each direction ]
That's why you need to prove the whole chain end to end. From spec to C code. Hand-translating a proof to working C code will most likely introduce bugs. There are now proof tools that can prove the complete chain from spec to machine code (!)