Basically, it affords you the ability to cache for longer and still end up with users able to go to your website.
Right now, you can try resolving common hosts, and you will see that they often provide several hosts in response to a lookup. What the browser does with those IPs is up to the browser, the standard does not define what to do. What the administrator that sets up that record wants is "send to whichever one of these seems healthy", and some browsers do do that. Other browsers just pick one at random and report failure, so your redundancy makes the system more likely to break.
What I want is a way to define what to do in this case. Maybe you want to try them all in parallel and pick the first to respond (at the TCP connection level). Maybe you want to try them sequentially. Maybe you want to open a connection to all of them and send 1/n requests to each. Right now, there is no way to know what the service intends, so the browser has to guess. And each one guesses differently.
(You will notice that people like Google and Cloudflare skillfully respond with only one record with a 5 minute TTL. That is so the behavior of the browser is well defined, but it also eats their entire year of 99.999% uptime with one bad reply. Your systems had better be very reliable if DNS issues can eat a year's worth of error budget.)
> (You will notice that people like Google and Cloudflare skillfully respond with only one record with a 5 minute TTL. That is so the behavior of the browser is well defined, but it also eats their entire year of 99.999% uptime with one bad reply. Your systems had better be very reliable if DNS issues can eat a year's worth of error budget.)
This chapter in the Google SRE book explains how our load balancing DNS works:
I skimmed though, not a bad idea- instead of using a reverse proxy, you are basically doing a poor man multicast, by letting many servers answer a request. And instead of rewriting the packets, you encapsulate, which should be lighter and faster.
It might be a little more resilient than even a very minimal nginx, but more than that, I think it must give you more control about what happens when a packet is not "answered" after some set amount of time - you write off who should have been the answerer, then resend that same packet to another server. Keep a buffer of packet, scrape them from the buffer when ACK'ed by the answerer, resend them to another answerer if not ACK'ed after some set amount of time.
Am I guessing correctly?
It seems a bit overcomplicated for normal usecases, but adequate for a large scale like google.
The design you propose is stateful, and if you read the chapter closely, you can see we spend a lot of effort to make things stateless.
The main thing I wanted to respond to in this thread about a single bad server destroying your yearly SLO is described in the first paragraph in the section on load balancing at the virtual IP address.
Sorry I couldn't find a clear rationale in the link. Why does Google prefer a stateless load balancer? Is it infeasible to maintain state at that scale?
> What the administrator that sets up that record wants is "send to whichever one of these seems healthy
In the rrDNS, remove the A record of the hosts that fails tests, or that has a load that's too high
> Maybe you want to try them all in parallel and pick the first to respond (at the TCP connection level).
Something a geoIP at your DNS can do, certainly not as good as doing that in the client, but it should be decent enough.
> Your systems had better be very reliable if DNS issues can eat a year's worth of error budget
Or, if you aren't google or cloudflare, use a 30 to 60s TTL in rrDNS, with health checks to selectively remove IP that fail, on pools splitting your servers by region with geoIP - this way, if 1/10 of your east coast servers fail, nobody from APAC will be impacted, and only 1/10th of your US east users, and only for the TTL (I'm abstracting ISP that cache for too long, but you already mitigate a lot of the problem there)
I can see how it would be easier to handle that in the browser, but you may already be able to do that with some JS to estimate the latency, then store the result in a cookie that causes a reload to www.eastcoast.yoursite.com if the user sticks to www.yoursite.com or after returning home goes to www.apac.yoursite.com while new measurement say "not optimal" and update the cookie
I am kind of OK with this solution, and is in fact my plan to roll out HTTP/3 for my personal sites. I wrote https://github.com/jrockway/nodedns to update a DNS record to contain the IP addresses of all schedulable nodes in my cluster. I can then serve HTTP/3 on a well-known port and it is probable that many requests will reach me successfully. (I had to do this because my cloud provider's load balancer doesn't support UDP, and I don't have access to "floating IPs"; basically my node IPs change whenever the cluster topology needs to change.)
I don't really like it because it still means a minute of downtime when the topology does change. I would prefer telling the browser what strategy to use to try a new node, rather than relying on heuristics and defaults.