Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My impression, from reading the docs around Google's "premium-tier network routing" — and just from the "feeling" of deploying GCLB updates — is that when you're configuring "a" Google Cloud Load Balancer, you're actually configuring "the" Google Cloud Load Balancer. I.e., your per-tenant virtual LB config resources, get baked down along with every other tenants' virtual LB config resources, to form a single real config file, across all of GCP (maybe all of Google?), which then gets deployed to not only all of Google's real border network switches, across all their data centers; but also to all their edge network switches, in every backbone transit hub they have a POP in.

(Why not just the switches for the DC(s) your VPC is in? Because GCLB IP addresses are anycast addresses, with BGP peers routing them to their nearest Google POP, at which point Google's own backhaul — that's the "premium-tier networking" — takes over delivering your packets to the correct DC. Doing this requires all of Google's POP edge switches to know that a given GCLB-netblock IP address is currently claimed by "a project in DC X", in order to forward the anycast packets there.)

To ensure consistency between deployed GCLB config versions across this huge distributed system — and to avoid that their switches constantly being interrupted by config changes — it would seem to me that at least one — but as many as four — of the following mechanisms then take place:

1. some distributed system — probably something Zookeeper-esque — keeps global GCLB state, receiving virtual GCLB resource updates at each node and consensus-ing with the nodes in other regions to arrive at a new consistent GCLB state. Reaching this new consensus state across a globally-distributed system takes time, and so introduces latency. (But probably very little, because the resources being referenced are all sharded to their own DCs, so the "consensus algorithm" can be one that never has to resolve conflicts, and instead just needs to ensure all nodes have heard all updates from all other nodes.)

2. Even after a consistent global GCLB state is reached, not every one of those new consistent global states get converted into a network-switch config file and pushed to all the POPs. Instead, some system takes a snapshot every X minutes of the latest consistent state of the global-GCLB-config-state system, and creates and publishes a network-switch config file for that snapshot state. This introduces variable latency. (A famous speedrunning analogy: you can do everything else to remediate your app problems as fast as you like, but your LB config update arrives at a bus stop, and must wait for the next "config snapshot" bus to come. If it just missed the previous bus, it will have to wait around longer for the next one.)

3. Even after the new network-switch config file is published, the switches might receive it, but only "tick over" into a new config file state on some schedule, potentially skipping some config-file states if they're received at a bad time. Or, alternately, the switches might themselves coordinate so that only when all switches have a given config file available, will any of them go ahead and "tick over" into that new config.

4. Finally, there is probably a "distributed latch" to ensure that all POPs have been updated with the config file that contains your updates, before the Google Cloud control plane will tell you that your update has been applied.

No matter which of these factors are at fault, it's a painfully long time. I've never seen a GKE GCLB Ingress resource take less than 7 minutes to acquire an IP address; sometimes, it takes as much as 17 minutes!

And while there's definitely some constant component to the time that this config rollout takes, there's also a huge variable component to it. At least one of #2, #3, or #4 must be happening; possibly multiple of them.

---

You might ask why load-balancer changes in AWS don't suffer from this same problem. AWS doesn't have nearly as complex a problem to solve, since AFAIK their ALBs don't give out anycast IPs, just regular unicast IPs that require the packets be delivered to the AWS DC over the public Internet. (Though, on the other hand, AWS CDN changes do take minutes to roll out — CloudFront at least distributed-version-latched for rollouts, and might be doing some of the other steps above as well.)

You might ask why routing changes in Cloudflare don't suffer from this same problem. I don't know! But I know that they don't give their tenants individual anycast IP addresses, instead assigning tenants to 2-to-3 of N anycast "hub" addresses they statically maintain; and then, rather than routing packets arriving at those addresses based purely on the IP, they have to do L4 (TLS SNI) or L7 (HTTP Host header) routing. Presumably, doing that demands "smart" switches; which can then be arbitrarily programmed to do dynamic stuff — like keeping routing rules in an in-memory read-through cache with TTLs, rather than depending on an external system to push new routing tables to them.



AWS separates the anycast LB functionality into a separate service called AWS Global Accelerator. You do get individual anycast IP addresses with that service.


Ah, interesting; it's been a while since I played with AWS, and that service wasn't there back then. I'm guessing that allocating a new AWS Global Accelerator address takes a while?


I've only done it once (the way they have it architected, it's a "set and forget" sort of thing, your LB changes don't touch the Global Accelerator) but I do seem to recall that it took awhile to create the resource. Maybe 5-10 minutes?


5-10 minutes is accurate for creating and rolling out changes to a Global Accelerator


> It's intriguing to me that AFAIK load-balancer changes in AWS don't suffer from this problem. (Though, on the other hand, CDN changes do.)

The architecture is a lot different.

Using google means working with the load balancer in some form. It's all interconnected.

AWS is all separate parts that are stitched together thinly.

E.g. you can have a single global load balancer in Google that handles your whole infrastructure (CDN and WAF are part of LB too). There isn't an AWS equivalent. You would need a global accelerator + ALBs per region and more. WAF is tied to each ALB etc.


> AWS is all separate parts that are stitched together thinly.

Yeah I always hate this when I have to work with AWS. All their services feel like they were designed by completely different companies. Every management interface looks and feels different, and there are tons of services that do almost the same thing so it's not clear which would be best to use. It's a maze to me.

Luckily I don't have to work with cloud a lot but I really prefer Azure where everything is in the same console and there isn't a lot of overlap. But cloud guys seem to hate it, not sure why.


    > I really prefer Azure where everything is in the same console and there isn't a lot of overlap. But cloud guys seem to hate it, not sure why.
Because Azure API's are always changing and their SDK support for non-C# is wild west.

Also, everything is a Wizard because MS doesn't want to expose the sausage factory.


> CloudFront is anycast-routed

This is false, cloudfront uses DNS (geo & latency) based load balancing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: