Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Billion Words: Today's language modeling standard should be higher (googleresearch.blogspot.com)
9 points by vikram360 on May 1, 2014 | hide | past | favorite | 2 comments


The GZ file is only 1.7GB, I imagine a densely-packed model would almost fit on a machine with 8GB of RAM, which is surprising.

http://www.statmt.org/lm-benchmark/


Along similar lines, all of the English Wikipedia is < 10GB, and about 45GB uncompressed: http://en.wikipedia.org/wiki/Wikipedia:Database_download#Eng.... That omits all the history (just the current pages), but still surprising to me how small it seems now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: