Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It was indeed pretty rough it wouldn't surprise me if Google moves to js generated dom elements to combat rank trackers, at the time it was fine because they want to service non-js browsers but that might change.

Parsing it wasn't hard but it wasn't fun...



Wouldn't it be easier if they generated the dom elements via JS? That would imply that they're getting a JSON or something like, parsing it and creating the DOM.


No because then you'd have to use a headless browser that can execute js. That increases time and cost when scraping, though it wouldn't surprise me if it ends up going that way.


Why execute the JS? Just parse the JSON (I'm just guessing here). No need to mess with DOM or JS.


From my last and admittedly cursory look it wasn't that simple. IIRC JSON wasn't even used, I think it was a binary protocol.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: