The question that needs to be answered is not, "But the guarantees that we expect are much stronger than a correct protocol, handshake, or compiler optimization. We need the entire system to be reliable, and we need proof effort to be minimal for widespread adoption to be feasible." (Pretend that's a question, ok?) The question that needs answering is, "what is the easiest way to provide (strong) guarantees that the code is reliable?"
If you wanted to write a filesystem, how much work would you have to put into it before you could consider it "reliable" using traditional techniques? If it is 10x more than it would take to simply write the code, then "MIT’s FSCQ file system, developed over about 1.5 years using Coq, the complete system was 10x more code than a similar unverified filesystem. Imagine that — 2000 lines of implementation become 20,000 lines of proof!" doesn't sound so bad, does it? And 10x is not necessarily a bad estimate---unit testing alone would account for 2x, at a rough guess, and not only does it not cover all possible executions, it also has a reputation in this neck of the woods of focusing on the happy path.
"An Empirical Study on the Correctness of Formally Verified Distributed Systems" has been discussed here before (https://news.ycombinator.com/item?id=14441364) but I'd like to mention one quote from it:
"Finding 9: No protocol bugs were found in verified systems, but 12 bugs were reported in the corresponding components of unverified systems. This result suggests that recent verification methods developed for distributed systems improve the reliability of components that, despite decades of study, are still not implemented correctly in real-world deployed systems. In fact, all unverified systems analyzed had protocol bugs in the one-year span of bug reports we analyzed."
(And speaking as a guy who has some network protocol experience, some of the bugs that were found in the "shim" layers were...likely caused by inexperience: no, TCP does not preserve write boundaries on the recv end, and yes, UDP will duplicate packets under some error conditions.)
I will admit that many of the author's specific complaints are completely true. This quote should be in big letters in the introduction to any formal verification work: "On top of all that, if any spec or implementation changes even slightly, all the specs, impls, and proofs that depend on it will change as well. This makes even small changes a huge pain." And Coq is something of a pain in the rump. (I encourage anyone with a few spare hours to work through Software Foundations (https://softwarefoundations.cis.upenn.edu/); it's a lot of fun but at the same time very frustrating. [That should be "many, many spare hours". --ed.]) The least damning thing you can say about Coq is that it's not very well engineered. On the other hand, SMT solvers like the kind of thing behind SPARK and Frama-C are brilliant when they succeed in finding proofs but when they fail,...well, they light up a "?" on the dashboard and expect the experienced driver to know what the problem is. Further, they're rather limited on the programming features you can use with them---Dafny and the Java verifiers have garbage-collected languages, alleviating some of the pain.
But still, to address the author's fundamental issue,
"I went into the class believing that formal verification is the future — the only solution to a world of software ridden with bugs and security issues. But after recent events and a semester of trying to apply formal methods, I’m a serious skeptic. In this post, I’ll discuss why I think formal verification has a long way to go — and why it just doesn’t work right now."
I don't think any clear-cut answer exists. For some applications (like your FS example) formal proofs are a complete no-brainier. On other cases, you are better served by many tests. On yet different cases, even tests are too much, and you will do better matching your code format to the specification as well as possible. Not to talk about the cases where specs validation is the real issue, and all your attention will be there.
It's not even an either-or issue. You can have all of those situations appearing on different places of the same code-base.
The question that needs to be answered is not, "But the guarantees that we expect are much stronger than a correct protocol, handshake, or compiler optimization. We need the entire system to be reliable, and we need proof effort to be minimal for widespread adoption to be feasible." (Pretend that's a question, ok?) The question that needs answering is, "what is the easiest way to provide (strong) guarantees that the code is reliable?"
If you wanted to write a filesystem, how much work would you have to put into it before you could consider it "reliable" using traditional techniques? If it is 10x more than it would take to simply write the code, then "MIT’s FSCQ file system, developed over about 1.5 years using Coq, the complete system was 10x more code than a similar unverified filesystem. Imagine that — 2000 lines of implementation become 20,000 lines of proof!" doesn't sound so bad, does it? And 10x is not necessarily a bad estimate---unit testing alone would account for 2x, at a rough guess, and not only does it not cover all possible executions, it also has a reputation in this neck of the woods of focusing on the happy path.
"An Empirical Study on the Correctness of Formally Verified Distributed Systems" has been discussed here before (https://news.ycombinator.com/item?id=14441364) but I'd like to mention one quote from it:
"Finding 9: No protocol bugs were found in verified systems, but 12 bugs were reported in the corresponding components of unverified systems. This result suggests that recent verification methods developed for distributed systems improve the reliability of components that, despite decades of study, are still not implemented correctly in real-world deployed systems. In fact, all unverified systems analyzed had protocol bugs in the one-year span of bug reports we analyzed."
(And speaking as a guy who has some network protocol experience, some of the bugs that were found in the "shim" layers were...likely caused by inexperience: no, TCP does not preserve write boundaries on the recv end, and yes, UDP will duplicate packets under some error conditions.)
I will admit that many of the author's specific complaints are completely true. This quote should be in big letters in the introduction to any formal verification work: "On top of all that, if any spec or implementation changes even slightly, all the specs, impls, and proofs that depend on it will change as well. This makes even small changes a huge pain." And Coq is something of a pain in the rump. (I encourage anyone with a few spare hours to work through Software Foundations (https://softwarefoundations.cis.upenn.edu/); it's a lot of fun but at the same time very frustrating. [That should be "many, many spare hours". --ed.]) The least damning thing you can say about Coq is that it's not very well engineered. On the other hand, SMT solvers like the kind of thing behind SPARK and Frama-C are brilliant when they succeed in finding proofs but when they fail,...well, they light up a "?" on the dashboard and expect the experienced driver to know what the problem is. Further, they're rather limited on the programming features you can use with them---Dafny and the Java verifiers have garbage-collected languages, alleviating some of the pain.
But still, to address the author's fundamental issue,
"I went into the class believing that formal verification is the future — the only solution to a world of software ridden with bugs and security issues. But after recent events and a semester of trying to apply formal methods, I’m a serious skeptic. In this post, I’ll discuss why I think formal verification has a long way to go — and why it just doesn’t work right now."
Ya' got a better option?