One of the authors here: we wrote this during the Pragmatic Programmer's writing month in 2010 and some more in 2011. Then I got caught up writing my PhD thesis, and now a new job (as an NLP engineer, but in Java ;)).
So, the book is basically frozen. We hope to have more time in the future to continue the writing...
Nice endeavor, but finished up as the most endeavors - unfinished. :)
That was the first book in NLP (and the only for now) that I read. I've been interested both in NLP and Haskell. In that respect it fitted, thanks!
A few points to criticize. For the frequency list one should use multisets, not dictionaries. There are a few multiset packages at Hackage. Suffix arrays are badly explained. Monads - very badly. With tagging there was an impression that it could be explained simpler.
Many things are announced but not touched. The book is not a book in fact, it's more like an article. Perhaps reconsider it in that way? But oke, hopefully you will find time to continue it as a book.
Perhaps meanwhile you can recommend some other book to continue reading on NLP?
I'm sorry for that! All other sections were written nicely or OK, and I appreciate for what I picked up from the book. I just wanted to point out some places needed to be reworked in case you continue.
Myself being in industry, I know how hard, near to impossible it is to find time for anything extra than work and family. And a decent book requires approximately the same amount of effort as finishing PhD. Perhaps that was my frustration coming out of the projects I had to abandon. :(
Take a look at Coursera, the NLP course by Jurafsky/Manning (authors of recommendable books) was ok; and right now there's another course starting by Collins, another state-of-art researcher in NLP.
I'd like to thank you for putting out what you've done. I got a lot out of it and I'm sure I'd get more out of it if I understood Haskell better. I look forward to reading the whole thing if you get a chance to finish it!
Basing the examples on standard String class seems dangerous.
As soon as you get a corpus of any reasonable size (and you'll have to use large corpora for any meaningful, non-toy results), the various Haskell String-like classes and laziness-control options are mandatory, but tricky/ugly when starting to use them.
So, the book is basically frozen. We hope to have more time in the future to continue the writing...