That's a good question. Speaking for myself... I'd settle for some comibined metric of "fewer security bugs" and "fewer unintentional behaviors" (per spec). You could argue about UX and such, but most of the real damage that's caused by defective software is basically about security bugs. The "fewer unintentional behaviors" bit is triciker, but math and precision (about what you want) can help with that. Usually a (mathematical) spec is shorter and more concise than the program implementing it.
The problem is that once you get to defining your specs as programs themselves, they can have bugs, and the closer those specs get to maths, the less the people who actually define the specs in the first place - usually neither programmers nor mathematicians - can understand them, reducing the likelihood that anyone will notice the deviations between the spec and what users actually want.
Additionally, frankly, "reliable software" isn't something that's needed at least 99 times out of 100. Software that more-or-less gets some specific task done is. Those tasks can change from day to day, and generally require a few tries before the person who gave the task is even sure what they want, so RAD is more of a necessity than spec-based development.
Basically, we now have this entire class of information workers, and very few tools to give them that are any better than an Excel spreadsheet for the most part. I don't see the article as trying to give traditional programmers new tools, it's trying to give information workers new tools, and probably traditional programmers will find some of those useful.
> I don't see the article as trying to give traditional programmers new tools, it's trying to give information workers new tools, and probably traditional programmers will find some of those useful.
That puts it much more clearly than I managed. Thanks :)
Eventually though, software has to solve a real problem for a human being, and real problems (and humans) tend to be messy. A spec should model the problem and its solution, and will inevitably be sort of messy as a consequence. I also doubt that any spec survives implementation. Solutions must be adapted as the understanding of a problem and its appropriate solutions grow, and it is almost impossible to gain that understanding without experimentation (partial implementation).
I think the distinction is between agile (we'll start with a vague spec, and flush it out as we go) and waterfall (we'll start with the perfect spec before we start coding). Obviously, there are many points in between.
Nice concise mathematical specs are rare in practice, not because programmers are incapable of expressing such specs (and most probably aren't), but most programming problems are not expressible rigorously via math; they might not even be amenable to much analysis (so called "wicked" problems are common in programming).
I'll take a guess that something like pure functions lend themselves to formal, concise specs. Pure functions are, in the end, just input and output, not something complex like a program that outputs some complex bitmap every 30 milliseconds. Haskell has found pure functions to be very practical to have as 'distinguished citizens'; there are enough problems which can to a great degree be expressed with pure code.
Someone might say "but Haskell itself doesn't solve practical programming problems", or whatever. The point isn't Haskell or pure functional programming per se; it's just an example of things that easily lend themselves to formal specs. Whether more complicated, perhaps effectful programs lend themselves to formal specs (I think ATS might be able to express such specifications) is more hard to answer.
HN was written in a language where functional style is the norm, and yet HN couldn't have been designed via spec. I say this as someone who has thoroughly studied HN's codebase. If you spend time examining it, it becomes evident that most of the features were extemporaneous. There is little which was thought out in advance. The recipe for HN was: do the most important thing each day; repeat for 2000 days. Such a recipe is fundamentally anti-spec, because by day 5 your spec is outdated.
Spec xor exploratory programming. The codebase is the spec.
What's meant by reliability?