Good overview! I'd personally rather have better tooling for upgrades. Recently ...

zzyzxd · on Dec 5, 2023

> but the real problem is the mandatory node draining that causes downtime/disruption. > ... > In theory, there's nothing stopping you from just updating the kubelet binary on every node.

I am pretty sure Kubernetes itself does not mandate node draining. I have been doing upgrade for bare metal cluster for years, and like you said, it's mostly just replacing kubelet binary and bounce.

However, I do understand in public cloud it's usually recommended to perform a node rolling update instead of modifying online nodes in place. Actually, I prefer this way because of the benefits of immutable infrastructure. The downtime is unfortunately, but so far I have been enjoying working with devs to better designing reliable apps to tolerate node issues like this.

linuxftw · on Dec 4, 2023

> the real problem is the mandatory node draining that causes downtime/disruption.

This sounds a lot like "We don't actually patch the OS" which is quite common among many companies.

As a former enterprise kubernetes distro maintainer, I can tell you with certainty that most on-premise kubernetes customers aren't patching their machines between kubernetes releases, and try to stay on kubernetes releases for 18+ months.

cyrnel · on Dec 5, 2023

I have hope for that problem to be solved too, with some combination of minimal kernels and live patching. I really don't think running two copies of everything and hammering CPU/RAM/Disk/Network with constant drain operations is a permanent solution for applying patches.

op00to · on Dec 4, 2023

100% as someone who used to support Kubernetes commercially, long term support is an engineering nightmare with kube. My customers who could upgrade easily were more stable and easily supported. The customers that couldn’t handle an upgrade were the exact opposite - long support cases, complex request processes for troubleshooting information, the deck was probably stacked against them from the start.

Anyway, make upgrades less scary and more routine and the risk bubbles away.

willejs · on Dec 4, 2023

Rotating out nodes during an upgrade is slow and potentially disruptive, however your systems should be built to handle this, and this is a good way of forcing it.

xnyanta · on Dec 5, 2023

You definitely don't need to drain your nodes. I have never drained my nodes on my peronal cluster and just update and restart the control-plane components.

The procedure is more of a cloud-ism where people don't upgrade their nodes in place but rather get entirely new nodes.