Hacker News new | past | comments | ask | show | jobs | submit login

We continuously scrape a sizable number of ecommerce sites and have had no trouble whatsoever bypassing CloudFlare's antibot technologies.

CloudFlare representatives often defend user hostile behaviour with the justification that it is necessary to stop bad actors but considering how ineffective cloudflare is at that goal in practice it seems like security theatre.




I disagree.

We’ve worked across a number of equivalent anti-bot technologies and Cloudflare _is_ the AWS of 2016. Kasada, Akamai are great alternatives and are certainly more suitable to some organisations and industries - but by and large, Cloudflare is the most effective option for the majority of organisations.

That being said, this is a rapidly changing field. In my opinion, regardless of where you stand as a business, ensure abstraction from each of these providers is in place where possible - as onboarding and migrating should be table stakes for any project or business onboarding them.

As we’ve seen over the last 3 years, platform providers are turning the revenue dial up on their existing clientele.


It's success as a business aside, at a technical level neither Cloudflare nor its competitors provide any real protection against large scale scraping.

Bypassing it is quite straightforward for most average competency software engineers.

I'm not saying that CloudFlare is any better or worse at this than Akami, Imperva etc, I'm saying that in practice none of these companies provide an effective anti-bot tool, and as far as I can tell, as someone who does a lot of scraping, the entire anti-bot industry is selling a product that simply doesn't work.


In practice they only lock out "good" bots. "Bad" bots have their residential proxy botnets and run real browsers in virtual machines, so there's not much of a signature.

This often suits businesses just fine, since "good" bots are often the ones they want to block. A bot that would transcribe comments from your website to RSS, for example, reduces the ad revenue on your website, so it's bad. But the spammer is posting more comments and they look like legit page views, so you get more ad revenue.


I don't believe that distinction really exists anymore.

These days everyone is using real browsers and residential / mobile proxies, regardless of whether they are a spammer, or a Fortune 500, a retailer doing price comparison of an AI company looking for training data.


Random hackers making a website to RSS bridge aren't using residential / mobile proxies and real browsers in virtual machines. They're doing the simplest thing that works which is curl, then getting frustrated and quitting.

Spammers are doing those things because they get paid to make the spam work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: