Hacker News new | past | comments | ask | show | jobs | submit login

I took a look at the code in the author's GitHub repository.

The data sources are CSVs in this repository: https://github.com/BobAdamsEE/SouthParkData/

Looks like all the data is preprocessed, with everyone mostly having only 1 line. (Actually, it appears the line you note in 10-3 is broken!) You can make an argument that the script isn't processed correctly, but that's beyond the scope of the analysis, although a note might be helpful.




It's my repository. I'll look at how the python script handles flashback events later today. Thanks for the feedback!


It appears that there are two issues that affect small parts of the captured datasets:

1) Colored character names are not handled properly. I looked for <th> tags, not <th bgcolor="beige"> tags.

2) Character names that start with a lower case character are not handled. This may have to do with other episodes using lower case prefixed table headers for stage directions, I have to double check.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: