How much excess capacity would a cloud provider typically have (as in %) to accommodate normal peak loads and still have a comfortable margin?
Or maybe that's not something providers reveal, so let me as the sys-admins of this thread: For on-premise resources, what is your % margin of excess capacity reserved?
For specific ec2 instance types the excess capacity is sometimes zero, which customers sometimes see. I've needed to swap instant types to get a deploy unstuck or to scale up in the past when capacity didn't exist.
Because of that I always recommend that a project reserve instances to ensure capacity exists when you need it. It's also cheaper, but that risk is important to mitigate.
"How much excess capacity would a cloud provider typically have (as in %) to accommodate normal peak loads and still have a comfortable margin?"
We (rsync.net) run a sort-of just-in-time deployment model wherein as our storage arrays approach 80% capacity, we completely remove-and-replace the oldest storage array at a location[1].
The 80% number is related to performance penalties that become extant when you fill up zpools too full. In the old days, that is the threshold beyond which we would see significant performance degradation. ZFS behaves better now and you can avoid a fair amount of that degradation by having read and write cache devices (L2ARC and SLOG) but we still hang onto that 80% number ...
When we do such a replacement, we are often replacing a 2-3 year old storage array with a brand new one, which means that the drive capacity of the refreshed system is about double what it was before - so we have a relative glut of space for a time. This allows us to onboard larger (50-100 TB) clients with no lead-time.
FWIW, we have seen a relative increase in signups since lockdowns began. My current theory is that a lot of people are sitting at home with a lot of free time and thinking about risks ... and some of them think about backups.
[1] Denver, San Diego, Fremont, Zurich, and Hong Kong.
Or maybe that's not something providers reveal, so let me as the sys-admins of this thread: For on-premise resources, what is your % margin of excess capacity reserved?