Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good overview! I'd personally rather have better tooling for upgrades. Recently the API changes have been minimal, but the real problem is the mandatory node draining that causes downtime/disruption.

In theory, there's nothing stopping you from just updating the kubelet binary on every node. It will generally inherit the existing pods. Nomad even supports this[1]. But apparently there are no guarantees about this working between versions. And in fact some past upgrades have broken the way kubelet stores its own state, preventing this trick.

All I ask is for this informal trick to be formalized in the e2e tests. I'd write a KEP but I'm too busy draining nodes!

[1]: https://developer.hashicorp.com/nomad/docs/upgrade



> but the real problem is the mandatory node draining that causes downtime/disruption. > ... > In theory, there's nothing stopping you from just updating the kubelet binary on every node.

I am pretty sure Kubernetes itself does not mandate node draining. I have been doing upgrade for bare metal cluster for years, and like you said, it's mostly just replacing kubelet binary and bounce.

However, I do understand in public cloud it's usually recommended to perform a node rolling update instead of modifying online nodes in place. Actually, I prefer this way because of the benefits of immutable infrastructure. The downtime is unfortunately, but so far I have been enjoying working with devs to better designing reliable apps to tolerate node issues like this.


> the real problem is the mandatory node draining that causes downtime/disruption.

This sounds a lot like "We don't actually patch the OS" which is quite common among many companies.

As a former enterprise kubernetes distro maintainer, I can tell you with certainty that most on-premise kubernetes customers aren't patching their machines between kubernetes releases, and try to stay on kubernetes releases for 18+ months.


I have hope for that problem to be solved too, with some combination of minimal kernels and live patching. I really don't think running two copies of everything and hammering CPU/RAM/Disk/Network with constant drain operations is a permanent solution for applying patches.


100% as someone who used to support Kubernetes commercially, long term support is an engineering nightmare with kube. My customers who could upgrade easily were more stable and easily supported. The customers that couldn’t handle an upgrade were the exact opposite - long support cases, complex request processes for troubleshooting information, the deck was probably stacked against them from the start.

Anyway, make upgrades less scary and more routine and the risk bubbles away.


Rotating out nodes during an upgrade is slow and potentially disruptive, however your systems should be built to handle this, and this is a good way of forcing it.


You definitely don't need to drain your nodes. I have never drained my nodes on my peronal cluster and just update and restart the control-plane components.

The procedure is more of a cloud-ism where people don't upgrade their nodes in place but rather get entirely new nodes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: