We (Baqend) use an approach that is somewhat different from what has been proposed here so far:
- Every one of our servers rate limits critical resources, i.e. the ones that cannot be cached. The servers autoscale when neccessary.
- As rate limiting is expensive (you have to remember every IP/resource pair across all servers) we keep that state in a locally approximated representation using a ring buffer of Bloom filters.
- Every cacheable resource is cached in our CDN (Fastly) with TTLs estimated via an exponential decay model over past reads and writes.
- When a user exceeds his rate limit the IP is temporarily banned at the CDN-level. This is achieved through custom Varnish VCLs deployed in Fastly. Essentially the logic relies on the bakend returning a 429 Too Many Requests for a particular URL that is then cached using the requester's ID as a hash key. Using the restart mechanism of Varnish's state machine, this can be done without any performance penalty for normal requests. The duration of the ban simply is the TTL.
TL;DR: Every abusive request is detected at the backend servers using approximations via Bloom filters and then a temporary ban is cached in the CDN for that IP.
We have not seen any serious DDoS attacks, apart from the ones we created through our own heavy load testing. UDP attacks are not really a problem, since the servers are only accessible by Fastly over TCP using a combination of SSL authentication and IP whitelisting.
I'm sorry, I know it's irrelevant, offtopic and I'm a horrible person but... "Baqend"? Who came up with this name? Was there some brainstorming involved and that was the best candidate? What does the branding say about your business? How is it pronounced?
And since we are in the Backend-as-a-Service market, the name is not all that unfitting. Although it cannot be denied that from time to time some people think we are French an spelled "Baquend".
- Every one of our servers rate limits critical resources, i.e. the ones that cannot be cached. The servers autoscale when neccessary.
- As rate limiting is expensive (you have to remember every IP/resource pair across all servers) we keep that state in a locally approximated representation using a ring buffer of Bloom filters.
- Every cacheable resource is cached in our CDN (Fastly) with TTLs estimated via an exponential decay model over past reads and writes.
- When a user exceeds his rate limit the IP is temporarily banned at the CDN-level. This is achieved through custom Varnish VCLs deployed in Fastly. Essentially the logic relies on the bakend returning a 429 Too Many Requests for a particular URL that is then cached using the requester's ID as a hash key. Using the restart mechanism of Varnish's state machine, this can be done without any performance penalty for normal requests. The duration of the ban simply is the TTL.
TL;DR: Every abusive request is detected at the backend servers using approximations via Bloom filters and then a temporary ban is cached in the CDN for that IP.