This is a great read. I know the single cluster for all env is something that is sort of popular but it's always made me uncomfortable for the reasons stated in the article but also for handling kube upgrades. I'd like to give upgrades a swing on a staging server ahead of time rather than go straight to prod or building out a cluster to test an upgrade on.
I tend to keep my staging and prod clusters identical, even names of services (no prod-web and stage-web, just web).
I'll set them up in different AWS accounts to clearly separate them and the only difference they have is the DNS name of the cluster and who can access them.
+100 to this. Why would any sane Op/Inf/SRE choose not to have at least account-level isolation - is it only a matter of cost due to under-utilization?
I prefer to have everything 100% isolated for dev / qa / stage / prod, and have process and tooling in place to explicitly cross the streams. This comes from a history of pain with random dev-to-prod (or worse, prod-to-dev) access and dealing with "real companies" with things like audit requirements.
Having them separate lets you do things like @odammit suggests, upgrade your cluster in staging without affecting your developers or customers.
If you don't want to go that far, you can set up separate AWS accounts that are all tied together via an organization, and you can set up IAM roles and whatnot to share your API keys between accounts. That gives you at least some isolation, but still lets you GSD the same way as if you have a single account.
> and you can set up IAM roles and whatnot to share your API keys between accounts. That gives you at least some isolation, but still lets you GSD the same way as if you have a single account.
Do not do this. You are defeating the purpose of account level separation if you're sharing API keys between accounts. Each AWS environment should be totally segregated from the others (cross-account IAM permissions only if you must), limiting the blast radius in the event of human error or a malicious actor.
Source: Previously did devops/infra for 6 years, currently doing security
> Why would any sane Op/Inf/SRE choose not to have at least account-level isolation - is it only a matter of cost due to under-utilization?
In our particular case, yes, pretty much. We are a small company with a small development team, so even if I would want to split accounts to different teams, we would end up having one account for 2-3 users, which doesn't make a lot of sense now.
> This is a great read. I know the single cluster for all env is something that is sort of popular but it's always made me uncomfortable for the reasons stated in the article but also for handling kube upgrades. I'd like to give upgrades a swing on a staging server ahead of time rather than go straight to prod or building out a cluster to test an upgrade on.
I've been doing patch-level upgrades in-place since the beginning, and never had a problem. For more sensitive upgrades, this is what I do: create a new cluster using based on the current state in order to test the upgrade in a safe environment before applying it to production.
And for even more risky upgrades, I go blue/green-like by creating a new cluster with the same stuff running in it, and gradually shifting traffic to the new cluster.
I tend to keep my staging and prod clusters identical, even names of services (no prod-web and stage-web, just web).
I'll set them up in different AWS accounts to clearly separate them and the only difference they have is the DNS name of the cluster and who can access them.
Edit: I suck at italicizing and grammar.