Doing things I don't understand myself is also a recipe for disaster, and in my ...

yolo3000 · on March 21, 2024

I would say both of what you said and what I said are recipes for disaster, but letting another team do things behind your back on things that you're responsible for, is not something you want to have. How would you feel if the cloud provider engineers suddenly downgraded your nodes to different specs, and causing downtime for your users? I think it's a false premise to assume that the application teams cannot observe their usage patterns and optimize themselves.

AeroNotix · on March 21, 2024

I think this mostly comes down to whether applications can handle downtime if their workloads are restarted, scale up/down based on demand.

It happens shockingly often that applications only support working with a single replica and even worse when those applications cannot run concurrently with replicas of themselves which prevent smooth rolling updates.

IME if applications are fault tolerant of restarts, or support concurrent replicas then scaling up and down to meet demand is absolutely fine.

mkl95 · on March 21, 2024

The reality for most engineers is that their CTOs stopped caring about tech somewhere between the late 90s and mid 2000s. You'll have to put up with processes designed by some dude who still views platform orgs as a bunch of sysadmins and webmasters.

almostdeadguy · on March 21, 2024

Treating performance and reliability (which is inescapably impacted upon by performance characteristics) as externalities is a great way to create perverse incentives for your engineering team.

Also this reads like a cry for help:

> Therefore, to keep our Kubernetes clusters optimized, it would necessitate mandating all teams to perpetually engage in complex manual optimization processes indefinitely, or until Mercari goes out of business.

beeboobaa3 · on March 21, 2024

Or you could learn the platform you are deploying your software to