Did I miss any discussion of what the "processing" is? Using the Stanford Part-O... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		mcguire on Feb 16, 2016 \| parent \| context \| favorite \| on: What it looks like to process 3.5M books in Google... Did I miss any discussion of what the "processing" is? Using the Stanford Part-Of-Speech tagger, my goofy project, Ashurbanipal, can tag all the words in one book in about 8 seconds on one core, or ~25,000 books from the Project Gutenberg 2010 DVD image on my 4-core (hyperthreaded) laptop with a 10GB JVM heap in about 8 hours.

dholowiski on Feb 16, 2016 [–]

Nope, there was almost no mention of what this was actually used for. The closest I found was a mention of the final output:

"single output files, tab-delimited with data available for each year, merging in publication metadata and other information about each book"

[edit] More info in a link at the bottom of the article: http://blog.gdeltproject.org/3-5-million-books-1800-2015-gde...

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact