Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not much. It's a more expressive and cleaner language, but on the other hand python has NLTK + scipy community.

Scala (or Java) is another great NLP language. It's got decent libraries (openNLP, mallet, mahout), hadoop, and Scala is almost as nice as Haskell.




> Not much. It's a more expressive and cleaner language, but on the other hand python has NLTK + scipy community.

Haskell's mechanisms for defining parsers, lexers, and other pattern match tools is so good it probably passes over the line from "pretty" to "objectively better".

A lot of people who need to lex and parse data and then act on it turn to Haskell. It has some really remarkable and efficient libraries. And even for "common" target languages it's reasonable to write extremely fast parsers. With tuning, projects like Aeson are among some of the fastest JSON parsers and writers out there (only a few projects exceed its speed and resource efficiency in ANY runtime).


I am guessing you might be conflating parsing natural language with parsing something that has a rigid and well defined grammar (like a programming language). NLP is a whole different beast.


> NLP is a whole different beast.

The very same patterns that define "packrat-like" parsers (which share a strong relationship to the monadic and "arrow-adic" parsers) can be extended to define things like DFAs and semantic pattern matching. And languages with support for rich, somewhat lazy pattern matching like Haskell and Prolog wipe the floor with eager languages without (e.g., C), which is ideal for semantic analysis.

While not an "authority" in the subject, I've spent a lot of time working with some very skilled folks in the field of NLP, Linguistics. Most tools they used (in our case licensed from X/PARC) had C underpinnings for performance, but ultimately consumed specifications that were very much like Prolog or Haskell in character. Talking to some of the linguists who wrote those tools suggested that had GHC existed (or Allegro or a fast prolog been cheaper) then they would have been much easier to write in those languages.


Maybe I've been brainwashed by statistical NLP people, but I think this summarizes my understanding pretty well.

http://en.wikipedia.org/wiki/Frederick_Jelinek

> "Every time I fire a linguist, the performance of the speech recognizer goes up"

As far as I know, modern, successful NLP systems don't have much human knowledge baked in and are produced by training on large data sets.


Do you have more info on this? I'd love to read more.


I'm afraid I can't say much more beyond what I have without talking out of my rear. But you can read about X/P's XLE project here: http://www2.parc.com/isl/groups/nltt/xle/


By extension, Clojure can use those same java libraries




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: