To block something like this you need to determine what is botnet traffic vs legit traffic. It's hard.
Source IP doesn't work since it is random and changes. You need to look at things such as HTTP headers, TCP window and any odd flags that might be set. If you're lucky the botnet isn't capable of running a copy of Chrome or Safari or using a random sample template from legit traffic. Lots of botnets are made up of low power IOT devices so once these devices are capable of running a full headless chrome it will get harder.
Not to mention when you do figure out how to discriminate traffic you have to code it. And the code to determine valid traffic vs invalid better run fast because you are getting hit with 100k requests per second. Oh did I mention the attacker can change their algorithm whenever they want? Hope you have a full tensorflow ML/AI pipeline that configures your hardware based ingress of choice just in time. All this while making sure your current production traffic is being served at a speedy pace and not blocking legit customers.
These are some of the issues Cloudflare and companies like them have to deal with.
In cases like this it's actually not that difficult as they're using devices that can be fingerprinted from the Internet. We at Shodan provide a local, embedded database (SQLite or RocksDB) so you can see which open ports connecting IPs have. If an IP is connecting from a device that's running weird ports, is compromised or has other unusual characteristics then you can either flag the connection as high risk or outright drop it if you're under attack. It's mostly used by banks etc. for fraud prevention but we have a few that use it for blocking traffic based on IP risk.
How does fingerprinting them help? You can fingerprint them but they are just desktops/mobile phones/laptops that have been compromised to be part of the botnet.
The compromised hosts that are part of the botnet look exactly like normal traffic.
If you have a database of known-compromised hosts (because a fingerprint scan of them shows something clearly identifiable as part of a botnet, which I think is usually rare [but possibly not for Mēris]), it can mitigate an attack if you've already blocked them.
But the problem that still exists is the initialization traffic -- there are still up to 200k hosts that may hit your site (essentially, a syn flood). Depending on your infrastructure, that can still hurt your firewall or single server. But it is unlikely to hurt as much as having to actually respond (through a request stack) to those requests.
That's not what the article said though. They say that the compromised devices had these characteristics among others:
* Port 2000 open
* Port 5678 open
* SOCKS proxy on port 80 (maybe)
Most likely most of the visitors to your website won't have those ports open and exposed to the Internet. That is a really easy way to filter traffic based on the network fingerprint. Especially when you're under attack it's a great way to reduce a majority of the impact without requiring any AI/ ML - just filter traffic from IPs that have TCP port 5678 open. That same technique was also used to identify Mirai bots and it worked well.
I think in future servers will ask clients to solve a small computation. It can be theoretically incorporated into the handshake and if it takes something like 100ms, human users would not notice but botfarms will feel the pinch. An additional benefit is that servers can monetise the computation offsetting some of their costs.
Wouldn't bot farms just incorporate that as a "cost of doing business" and expand to absorb the computational load? After all, it's not like the bot farmers are paying to add more hardware.
Bot farms exist because they are cheap. You don't need to be perfect, ultimately one needs to adjust the cost of the handshake to ensure that it's higher than the average earning of the farmer.
E.g.: the handshake can be made more expensive choosing a "harder" function for the handshake and giving clients that behaves "good" the possibility to reuse the connections. Bots are penalised because they constantly have to make new handshakes.
But the economic incentives of a botnet are very different from those of a bot farm.
How much server resources are used before asking the client to do work? If they've got 100k clients, and each opens 100 TCP connections to your server, is your TCP stack or your load balancer going to fall over before you even start to do a TLS handshake?
Can you manage as many TLS handshakes as they can throw at you?
This does not help one bit with botnets. The problem of defending against botnets is not blocking many requests coming from each IP address, it's blocking requests coming all those compromised devices. Those devices are perfectly capable of doing that computation.
Introducing javascript into the mix will not make the botnet more difficult to detect. Headless browsers have their own fingerprints which allow defenders to identify them from legitimate traffic. You can spoof the features that headless browsers don't have but that will always be a cat and mouse game.
Source IP doesn't work since it is random and changes. You need to look at things such as HTTP headers, TCP window and any odd flags that might be set. If you're lucky the botnet isn't capable of running a copy of Chrome or Safari or using a random sample template from legit traffic. Lots of botnets are made up of low power IOT devices so once these devices are capable of running a full headless chrome it will get harder.
Not to mention when you do figure out how to discriminate traffic you have to code it. And the code to determine valid traffic vs invalid better run fast because you are getting hit with 100k requests per second. Oh did I mention the attacker can change their algorithm whenever they want? Hope you have a full tensorflow ML/AI pipeline that configures your hardware based ingress of choice just in time. All this while making sure your current production traffic is being served at a speedy pace and not blocking legit customers.
These are some of the issues Cloudflare and companies like them have to deal with.