When I labeled the comments, I didn't label books that were criticized. So in theory the model should filter out negative reviews. But currently the training dataset is pretty limited in size so you still can see some negative ones. I suspect that with more training data this problem will go away.