The article doesn't mention it explicitly, but this is a nice example of how usi...

The article doesn't mention it explicitly, but this is a nice example of how using Bayes theorem helps you ignore the hard-to-compute normalization term of the input space. In the article, this is the P(w) term of

P(c|w) = P(c)P(w|c)/P(w),

where c is a correction, and w is the original word.

The author does implicitly talk about this when he explains that P(c|w) conflates the two factors, but it's also not that hard to see that getting a handle on P(w) -- the probability space of misspellings -- is harder than getting a hold of P(c) -- the probability space of actual words, and Bayes lets us get rid of the former during optimization.