OP here. Do I really need to provide all of this to satisfy the reader's ability to grasp the basic premise of the site? this isn't a thesis or academic pursuit, just comparing some rappers for fun.
I used plain NLTK token analysis on rap genius lyrics. in terms of several more data slices...I agree that there should be more cuts of the data, but you must understand the amount of time that it took me to put this together.
Of course it's entirely up to you what to provide. It would be silly of me to question that. I'm not paying you for that, so how can I demand anything? I'm just saying what I think should've been done. That's how I would do it, at least.
You see, I have a strong opinion on that any data-analytic work is pretty close to being useless if it's not reproducible. And I mean really close. I already mentioned few questions that naturally arise reading your article which are crucial to understand your results and are not addressed by the article. So, ideally, any data analysis done for the open community should provide both full dataset and full sources. Unfortunately, it's not always possible: dataset may be several terabytes long, there might be legal issues about disclosure of the data (or letting everyone know how exactly author acquired them), sources might be under NDA, whatever. In this case description should be as pedantic as possible, because some little details can change the whole meaning of statistics entirely.
By the way, NLTK allows you to do all kinds of processing, so it isn't the answer.
So what would I do? The usual course, actually: would do the work in iPython Notebook, cleaning it up afterwards, would've drawn graphs in place and printed some few slices I've said already while processing, so it would be easier to understand what actually that unique words counted might be like. While fancy d3 graphs are cool for sure, but not nearly as useful.
I used plain NLTK token analysis on rap genius lyrics. in terms of several more data slices...I agree that there should be more cuts of the data, but you must understand the amount of time that it took me to put this together.