How much traffic are we talking about? Google Cloud Platform's load balancers (which sit outside of Kubernetes and are replicated geographically) are designed for quite a high load of traffic.
Either way, your client has to know what pod (group of Docker containers running your app) to send traffic to. The pods run on hosts and ports that aren't fixed in time. So if you want some kind of client-resolving load balancing without involving a central bottleneck on the server, you'd have to transport that inventory of pod-host-ports to the client, and then either provide a host-based proxy or open the ports on the nodes themselves. And of course you'd have to build the balancing/failover logic into the client (which would either have to be randomly distributed, or rely on resource utilization info fetched from Kubernetes).
That may be great for some narrow use cases, but for most applications, the load balancing pattern is much simpler.
Do you know the details of that? On their site they talk about balancing in terms of queries per second on big query. What is that equivalent to in terms of Gbps?
Either way, your client has to know what pod (group of Docker containers running your app) to send traffic to. The pods run on hosts and ports that aren't fixed in time. So if you want some kind of client-resolving load balancing without involving a central bottleneck on the server, you'd have to transport that inventory of pod-host-ports to the client, and then either provide a host-based proxy or open the ports on the nodes themselves. And of course you'd have to build the balancing/failover logic into the client (which would either have to be randomly distributed, or rely on resource utilization info fetched from Kubernetes).
That may be great for some narrow use cases, but for most applications, the load balancing pattern is much simpler.