Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

it is about how to sort with stop word. Tranditional tf-idf method didn't work well as it didn't contain any information about each word relative location in its context. a simple method is to index "the the", the word group instead of single "the". I guess it is what Google does now with "to be or not to be". However, the word grouping tech is a common method in CJK full text search.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: