I (the author) appreciate the feedback. I believe that many of your criticisms are addressed in the actual paper. First of all I completely agree that my sample size is too small for a conclusive proof. I mention in the paper that I hope that others will try and replicate this experiment on other pieces of software. I do think it's appropriate when conducting an experiment to publish a conclusion, not that the experiment will constitute proof (or an established scientific theory), but as a conclusion to the study that others can try to confirm or refute.
I also mention in the paper that it would be beneficial to conduct this experiment using different type systems for the reasons that you stated above.
The argument against static typing that I was testing didn't mention any particular type system nor any particular dynamically typed language, it was a general argument that stated that unit testing obviated static typing. Because the argument was so general and absolute I felt that any static type system that could be shown to expose bugs that were not caught by unit testing would be enough to refute the argument. I was not trying to prove that any type system would catch bugs not found by any unit tested software. The paper also points out that I'm trying to see whether unit testing obviates static typing in practice, in theory you could implement a poor mans type checker as unit tests, but my experiment was focused on whether in practice unit testing obviates static typing.
Finally I believe that my conclusion in the paper was at least a bit more modest than that of the blog post. The lack of apparent modesty in the blog post was caused more by a lack of ability on my part to accurately summarize than an inflated sense of accomplishment and self importance.
Thanks for the response! I appreciate the effort you went to here, this was no small task you set yourself to.
I appreciate the clarification. I think now I see better where your emphasis was: the purpose of the paper was to refute an argument, and of course the level of burden of proof is different and far less in that case. I think this misunderstanding on my part is what caused me to call the conclusions 'trivial' -- too strong and dismissive language on my part anyway.
The irony is, you were attempting to do to the unit-testing-is-sufficient argument what I was attempting to do to what I assumed yours was: provide one counter-example to falsify a broad and generalized thesis.
That said, I think I would have liked to have seen your original unit-testing-is-sufficient argument punched up and qualified into something a little more reasonable and real-world. As you stated the argument, it seems like a straw man to me. It seems one could reduce your version of the argument to something like: "Dynamic languages with unit test coverage will always catch errors that statically-typed environments will." And of course this is far too broad and unqualified a statement, and that is precisely why all you needed was one counter-factual to refute it. You didn't even need a handful of Python programs, or 9 or 20 or 100 errors to prove your point. You only needed one, as you stated above. This is why the burden of proof for your thesis was so small, but also why, in my opinion, even with that reduced scope and more modest conclusion, we haven't really learned much.
As someone who has spent most of my career in statically-typed environments and the last 6 years or so mostly in dynamic environments, and also as someone who has made something like the argument you were attempting to refute, I have to say I would definitely never have made such a brittle and unqualified statement as the one you refuted in your paper. To put it more directly, I think I'm probably a poster-child for the kind of developer you were aiming your thesis at, and I don't feel that my perspective was adequately or reasonably represented. More importantly, having looked at the examples given in your paper, I may have learned a bit about the kinds of errors that Haskell can catch automatically that some coders might miss in a dynamic environment, but not much useful to me in my everyday work context.
I think a more reasonable version of the argument, but more qualified and therefore requiring a far larger sampling of code to prove or refute, would be something like: "Programs written in a dynamic language with adequate or near-100 percent unit test coverage, are no more prone to defects than programs written in a statically-typed language with a comparable level of unit test coverage."
I agree this is a very important conversation to have, and again kudos to the work you put in here. Obviously people have strong opinions both directions, and the discussion, however heated at various moments, is an important one, so thanks for this!
I also mention in the paper that it would be beneficial to conduct this experiment using different type systems for the reasons that you stated above.
The argument against static typing that I was testing didn't mention any particular type system nor any particular dynamically typed language, it was a general argument that stated that unit testing obviated static typing. Because the argument was so general and absolute I felt that any static type system that could be shown to expose bugs that were not caught by unit testing would be enough to refute the argument. I was not trying to prove that any type system would catch bugs not found by any unit tested software. The paper also points out that I'm trying to see whether unit testing obviates static typing in practice, in theory you could implement a poor mans type checker as unit tests, but my experiment was focused on whether in practice unit testing obviates static typing.
Finally I believe that my conclusion in the paper was at least a bit more modest than that of the blog post. The lack of apparent modesty in the blog post was caused more by a lack of ability on my part to accurately summarize than an inflated sense of accomplishment and self importance.