Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had the same problem using 3.1.0, and with some suggestions from the news group, the html5lib alternative works fairly well. I never had a problem so far parsing about 6 sites i previous had to clean up using regexp.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: