Good article! I been doing scraping for the last 10 years and I've seen a lots of differents things to try to avoid us.
Also, I'm in the other side protecting websites to ban scrapers, so funny!
I'm in the same position for the first time (protecting against scraping) and honestly I'm kind of blind right now. Which is weird because of how much scraping I've done (okay not that much). Any tips or tricks or blogs you know of off the top of your head for protecting your site?
Virtually everything can be easily defeated. The only outfit I've consistently seen put up a good fight is Distil. They do it by acting a little like Cloudflare. They put their servers in front of your www facing endpoints and use ML to mine their global client traffic to identify bot signals (aided by some aggressive in-browser javascript fingerprinting).
Yeah, Distil is the first outfit I've encountered where they've got the model to make it really hard to reliably bypass. It comes down to "I can spend a significant amount of time trying to bypass this, and I would, but they would likely identify and block me again within a few weeks at most.", and it's not worth it when it's only part of what I need to do to scrap some data, and it's their entire job, and they can afford to hire multiple people.
The economics are in their favor, and I make it a point not to fight economics when I recognize them, it's rarely sustainable.
After the years, I've arrived at the conclusion that everything can be scrapped. What you have to do is try to put as many walls as you can. But if someone really wants to crawl your site, with the right knowledgement he will able to do it despite of all your walls.
In the times where most of the things are cloud tools, what are the advantages of having a WP APP installed in the laptop instead of using the native backoffice?
Thanks!
One of the best option is Digital Ocean. You can configure a Droplet (server) with a WP setup without any skill in linux or sysadmin. Their first price is €5/month. Their customer service is very good in any price, and their documentation too.
I really don't recomment GoDaddy, their hosting is very slow.
You can have different problems.
- Server requirements too low to handle your daily traffic
- WP custom development with slow querys
- Are you using any caching system and image optimization?
This is a great topic.
We can think in this solution or in the Elon Musk solution, about that every person must to be a salary, working or not.
I don't know what solution should be, but this is one of the main topic for the next decade.
I use to go to hackathons for few reasons:
1. To have fun developing a new product and coding
2. Meet new and interesting people
3. Look at people skills for when I'm hiring for my company
Every messenger apps (fb messenger, slack, telegram, skype) has its own chatbot directory list. Also, for example in fb messenger, you can run fb ads campaign to promote your bots, but the convertion rates are not very well.
In my case, I use the bots who helps me in two ways:
- News. I use it like a service to receive the latest news once per day.
- Productivity. I use some slack bots that it helps me for productivity and also my team.
Like the apps, if the bot is useful for the users, they will use it.
I've been trying Hipmunk's last week and I think that it helps a lot if you are looking for flights or hotels, but it needs a lot of improvements to try to understand the user. For example, the chatbots comes into a loop many times :)
But anyway, I think that this bot is a very good approach at what will be the future of booking flights and hotels.
I really recommend to use a hosted ecommerce platform to test the market for new sites and make a good and fast MVP. Once the ecommerce administrator has experiencie, he would know what he needs to manage an ecommerce site. This is the moment to switch to a self-hosted platform. If you are in this step, it means that the MVP works well and you are investing in a website that it seems to fit into the market.
Wordpress is used for more than 25% of internet sites. That also means that is the CMS most hacked :)
Fortunately the WP and its community is working hard to fix the problems asap and make new release.
The major problem is that people doesn't update the cms. I really recommend the auto-updated and a good management of all plugins versions. If you are a delveoper and you are taking care of several wp sites, there are many plugin that can help you to manage the WP and plugins versions for a large number of sites.