Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I worked on something similar for clustering BBC news articles. The (Ruby) code I used is here: https://github.com/bbcrd/similarity

I didn't account for names entities or n-grams in the feature vector though. That's a very interesting idea.

@mattdeboard - what algorithm did you use to count the occurrence and size of clusters?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: