Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are some of the major problems being faced because of web scraping?
8 points by nachivpn on Jan 5, 2015 | hide | past | favorite | 6 comments
Disclaimer: I work for an anti-scraping service company. I am not trying to advertise it, but simply understand problems that people are actually facing because of web scraping and how it is affecting them.



I'm sure there are instances where scraping websites causes legitimate issues, however most of the complaining I've seen from website operators was the perceived theft of their data. (even though it was publicly available through the browser) Not so much of a bandwidth or performance issue that the scraping causes.

I'm of the opinion that web scraping has an unwarranted bad reputation. As long as I'm respecting your robots.txt and not scraping behind logins, etc... then it's no different than how Google operates.


I think bandwidth costs and the possibility of accidentally DDoSing the site if the scraper gets out of control are probably big issues along with the 'theft of data' mentioned.


Surely you should know the problems if you are working for an anti-scraping company.... Anyway...

Most people who own small website dont necessarily know there website is being scrapped on a daily basis (talking sole traders, tiny businesses). If they are paying for adwords or local advertising through parish or county community websites then they may think they are getting bang for the buck than they actually think. If they get 10 visitors a day and 8 of those are scrapers what does this really mean for there advertising revenue. Obviously they should be basing there return on investment against revenue but still a website is seen as a big thing for most small businesses.


Yes, it is very true that many fail to realize that they are getting scraped simply because there aren't many tools which show the traffic classified among humans and bots. This surely is a problem. Thanks for leaving a comment!


Google penalising a site for not having original content may be one. Ofc, it uses bandwidth and costs the site you're scraping resource/money for no benefit to them.





Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: