This principle (always shutdown uncleanly) was a significant point of design discussion in Kubernetes, another one of the projects that adapted lessons learned inside Google on the outside (and had to change as a result).
All of the core services (kubelet, apiserver, etc) mostly expect to shutdown uncleanly, because as a project we needed to handle unclean shutdowns anyway (and could fix bugs when they happened).
But quite a bit of the software run by Kubernetes (both early and today) doesn’t always necessarily behave that way - most notably Postgres in containers in the early days of Docker behaved badly when KILLed (where Linux terminates the process without it having a chance to react).
So faced with the expectation that Kubernetes would run a wide range of software where a Google-specific principle didn’t hold and couldn’t be enforced, Kubernetes always (modulo bugs or helpful contributors regressing under tested code paths) sends TERM, waits a few seconds, then KILLs.
And the lack of graceful Go http server shutdown (as well as it being hard to do correctly in big complex servers) for many years also made Kube apiservers harder to run in a highly available fashion for most deployers. If you don’t fully control the load balancing infrastructure in front of every server like Google does (because every large company already has a general load balancer approach built from Apache or nginx or haproxy for F5 or Cisco or …), or enforce that all clients handle all errors gracefully, you tend to prefer draining servers via code vs letting those errors escape to users. We ended up having to retrofit graceful shutdown to most of Kube’s server software after the fact, which was more effort than doing it from the beginning.
In a very real sense, Google’s economy of software scale is that it can enforce and drive consistent tradeoffs and principles across multiple projects where making a tradeoff saves effort in multiple domains. That is similar to the design principles in a programming language ecosystem like Go or orchestrator like Kubernetes, but is more extensive.
But those principles are inevitably under communicated to users (because who reads the docs before picking a programming language to implement a new project in?) and under enforced by projects (“you must be this tall to operate your own Kubernetes cluster”).
This principle (always shutdown uncleanly) was a significant point of design discussion in Kubernetes, another one of the projects that adapted lessons learned inside Google on the outside (and had to change as a result).
All of the core services (kubelet, apiserver, etc) mostly expect to shutdown uncleanly, because as a project we needed to handle unclean shutdowns anyway (and could fix bugs when they happened).
But quite a bit of the software run by Kubernetes (both early and today) doesn’t always necessarily behave that way - most notably Postgres in containers in the early days of Docker behaved badly when KILLed (where Linux terminates the process without it having a chance to react).
So faced with the expectation that Kubernetes would run a wide range of software where a Google-specific principle didn’t hold and couldn’t be enforced, Kubernetes always (modulo bugs or helpful contributors regressing under tested code paths) sends TERM, waits a few seconds, then KILLs.
And the lack of graceful Go http server shutdown (as well as it being hard to do correctly in big complex servers) for many years also made Kube apiservers harder to run in a highly available fashion for most deployers. If you don’t fully control the load balancing infrastructure in front of every server like Google does (because every large company already has a general load balancer approach built from Apache or nginx or haproxy for F5 or Cisco or …), or enforce that all clients handle all errors gracefully, you tend to prefer draining servers via code vs letting those errors escape to users. We ended up having to retrofit graceful shutdown to most of Kube’s server software after the fact, which was more effort than doing it from the beginning.
In a very real sense, Google’s economy of software scale is that it can enforce and drive consistent tradeoffs and principles across multiple projects where making a tradeoff saves effort in multiple domains. That is similar to the design principles in a programming language ecosystem like Go or orchestrator like Kubernetes, but is more extensive.
But those principles are inevitably under communicated to users (because who reads the docs before picking a programming language to implement a new project in?) and under enforced by projects (“you must be this tall to operate your own Kubernetes cluster”).