This. The Scrapy tutorial is good if you just want to use scrapy to crawl a site...

TkTech · on Aug 5, 2014

HTMLXPathSelector is just a very small wrapper around lxml, it doesn't add anything parsing wise. You might as well just use lxml directly if you don't already have scrapy as a dependency.

https://github.com/scrapy/scrapy/tree/master/scrapy/selector

crdoconnor · on Aug 5, 2014

pyquery's a good alternative too. it's a slightly larger wrapper around lxml that lets you use jquery selectors.

klibertp · on Aug 5, 2014

lxml's `cssselect` method is nice for this - I found that with `xpath` and `cssselect` I have no need for anything else. I use cssselect for simple queries, like "a.something" - which would be needlessly verbose in XPath - and xpath for more complex ones, for example when I need access to axes or I want to apply some simple transform to the data before processing it in Python. Worked very well for me.

rahimnathwani · on Aug 5, 2014

Doh! Too late to edit or delete my original comment :(

(And I can't downvote my own comment.)