it is about how to sort with stop word. Tranditional tf-idf method didn't work well as it didn't contain any information about each word relative location in its context. a simple method is to index "the the", the word group instead of single "the". I guess it is what Google does now with "to be or not to be". However, the word grouping tech is a common method in CJK full text search.