Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OP here. vocab is non-linear:

mdaniels.com/vocab/scatter.png

so the average would change as your sample size grows.

35K was the threshold where things didn't get garbled (like the 100 word example that you mention).

The threshold is also impacted by who I include. If I went to 50K, I'd lose out on rappers like Drake.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: