Did I miss any discussion of what the "processing" is?
Using the Stanford Part-Of-Speech tagger, my goofy project, Ashurbanipal, can tag all the words in one book in about 8 seconds on one core, or ~25,000 books from the Project Gutenberg 2010 DVD image on my 4-core (hyperthreaded) laptop with a 10GB JVM heap in about 8 hours.
Using the Stanford Part-Of-Speech tagger, my goofy project, Ashurbanipal, can tag all the words in one book in about 8 seconds on one core, or ~25,000 books from the Project Gutenberg 2010 DVD image on my 4-core (hyperthreaded) laptop with a 10GB JVM heap in about 8 hours.