Hacker News new | past | comments | ask | show | jobs | submit login

I'm fairly sure the SRE books by Google don't mention kubernetes once, since it's a technology agnostic set of practices [1]. I would also say that the practices are independent of scale.

Google didn't necessarily invent it inasmuch as they coined a term for keeping systems running.

[1] https://sre.google/books/




I see a lot of comments mentioning scale, and dedicated SRE teams and IaaC and thought this might be helpful.

I believe distinguishing the engineering from the engineer in SRE is important. The engineering side is essential but a dedicated full-time engineer may not be crucial.

The amount of engineering required depends primarily on the delta between your required availability (captured in SLOs) and your current availability.

The higher your availability needs are the more engineering effort you'll need as the code travels from commit to production because availability doesn't start and end in production systems.

The framework exposes or makes explicit considerations to achieve that goal. Oftentimes it comes down to a choice in balancing the cost between human time (or manual effort) and automation to replace it,and out of that the technology to achieve it. You may very well find that your availability requirements are so low that you can get by with a human pressing a button every few hours, and that this very manual approach is acceptable.

Going through the process will answer most of the questions on scale, whether a dedicated Site Reliability Engineer is required and so on. The SRE framework scales even if the system themselves don't require sizable scale.


They do mention Borg though.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: