Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Bayesian" in this context most likely means naive bayes; which assumes occurrences of all words are independent of each other. The "score" of each label ends up being something like the product of all relative word frequencies, multiplied by the probability of the label itself: p(L) * p(W|L) / p(W).

But this would be super-effective if header fields like sender addresses and subject lines were somehow used as separate features, and you wanted to label based on that.



Yeah, or even be able to select what constitutes a feature for naive bayes and what is not considered. For each label separately configurable!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: