Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doing things I don't understand myself is also a recipe for disaster, and in my experience a rather greater one. The platform team is liable to make mistakes like scaling the service wrong or failing to anticipate upcoming changes. These incidents can be easily resolved by improving monitoring and communication, which are fundamentally useful things that I should already be doing for myriad other reasons. The mistakes I'm likely to make are things like "sequenced a complicated change wrong and null-routed the entire application" or "typo'd a volume name and found out that that autodeletes the entire database including backups", which I am simply not good at avoiding and constitute one of the major reasons I am in engineering instead of ops or IT. We are better off if I do the things I am best at and they do the things they are best at.


I would say both of what you said and what I said are recipes for disaster, but letting another team do things behind your back on things that you're responsible for, is not something you want to have. How would you feel if the cloud provider engineers suddenly downgraded your nodes to different specs, and causing downtime for your users? I think it's a false premise to assume that the application teams cannot observe their usage patterns and optimize themselves.


I think this mostly comes down to whether applications can handle downtime if their workloads are restarted, scale up/down based on demand.

It happens shockingly often that applications only support working with a single replica and even worse when those applications cannot run concurrently with replicas of themselves which prevent smooth rolling updates.

IME if applications are fault tolerant of restarts, or support concurrent replicas then scaling up and down to meet demand is absolutely fine.


The reality for most engineers is that their CTOs stopped caring about tech somewhere between the late 90s and mid 2000s. You'll have to put up with processes designed by some dude who still views platform orgs as a bunch of sysadmins and webmasters.


Treating performance and reliability (which is inescapably impacted upon by performance characteristics) as externalities is a great way to create perverse incentives for your engineering team.

Also this reads like a cry for help:

> Therefore, to keep our Kubernetes clusters optimized, it would necessitate mandating all teams to perpetually engage in complex manual optimization processes indefinitely, or until Mercari goes out of business.


Or you could learn the platform you are deploying your software to




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: