Hacker News new | past | comments | ask | show | jobs | submit login

I don't think anything is going to work on every web page in existence. Perhaps strlen.



Yeah, since i just wrote a spider last night using html5lib, and had to wrap it up in a try block, I can categorically say that it doesn't work for all webpages:

  parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup"))
  try:
    document = parser.parse(response)
  except Exception, e:
    print 'parse failed ' + str(e)
    return


And strlen certainly wouldn't if you actually expect a correct answer. Can't guess at encodings... :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: