Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can't recommend any libraries "for humans" for this, there are APIs out there for it. The main problem with many NLP libs (and data mining applications in general) has a lot to do with how much memory good models can take up when doing it in order for it to be accurate at all. Here are a few APIs and libs that might be useful though: (Disclaimer: publisher of this one here) https://www.mashape.com/agibsonccc/semantic-analytic

There are other text processing APIs on there as well. As for libraries, I primarily come from the JVM camp for NLP, but I would recommend the following libraries:

http://nlp.stanford.edu/software/index.shtml (Comprehensive) http://code.google.com/p/clearnlp/ (Fairly simple)

My favorite is cleartk (http://code.google.com/p/cleartk/ ) mainly due to the fact it's a consistent interface, but UIMA itself can be a difficult toolchain to pick up, and I could understand most of these being overkill for many simple applications people may have in mind.




I've heard some good things about OpenNLP as well http://opennlp.apache.org/

but haven't had the time to look at it with any detail.


OpenNLP is great. I've used it for a lot of subtasks, but nothing that produces end results as described earlier. It's an amazing library for building NLP systems, but doesn't produce anything directly (named entity recognition, etc) Typically it's coupled with other libraries.

The big problem I think with NLP in general is typically to do anything, you need a pipeline. (Sentence segmentation, tokenization, part of speech tagging) usually at a bare minimum. Then from there you can do named entity recognition or other tasks that produce actual usable results.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: