I’ve been running platform teams on aws now for 10 years, and working in aws for...

manquer · on Dec 11, 2021

Easy to say leave, the techinical lockin cloud service providers by design choose to have makes it impossible to leave .

AWS (and others) make egress costs insanely expensive for any startup to consider leaving with their data, also there is constant push to either not support open protocols or extend /expand them in ways making it hard to migrate a code base easily.

If the advise is to use only effectively use managed open source components then why AWS at all ? most competent mid sized teams can do that much cheaper with a colo providers like OVH/hetzner.

The point of investing in AWS is not outsource running base infra, if we should stay away from leveraging the kind of cloud native services us mere mortals cannot hope to build or maintain.

Also this avoid us-east-1 advice is bit frustrating, AWS does not have to experiment with new services always in the same region,it is not marked as experimental region or has reduced SLAs , if it is inferior/preview/beta than call it out in the UI and contract, what about when there is no choice? If cloudfront is managed in us-east-1 and we shouldnt now use it ? Why use the cloud then ?

if your engineering only discovers scale problems at us-east-1 along with customers perhaps something is wrong ? aws could limit new instances in that region and spread the load, playing with customers like this who are at your mercy just because you can is not nice.

Disney can afford to go down, or build their cloud, small companies don't have deep pockets to do either

cherioo · on Dec 11, 2021

> AWS (and others) make egress costs insanely expensive for any startup to consider leaving with their data

I have seen this repeated many times, but don't understand it. Yes egress is expensive, but they are not THAT expensive compared to storage. S3 egress per GB is no more than 3x the price of storage, i.e. moving out just cost 3 month of storage cost (there's also API cost but that's not the one often mentioned).

Is egress pricing being a lock-in factor just a myth? Is there some other AWS cost I'm missing? Obviously there will be big architectural and engineering cost to move, but that's just part of life.

bushbaba · on Dec 11, 2021

Often the other cloud vendors will assist in offering those migration costs as part of your contract negotiations.

But really, egress costs aren’t locking you in. It’s the hard coded AWS apis, terraform scripts and technical debt. Having to change all of that and refactor and reoptimize to a different providers infrastructure is a huge endeavor. That time spent night have a higher ROI being put elsewhere

manquer · on Dec 11, 2021

3months is only if you use standard S3, However intelligent tiering , infrequent access , reduced redundancy or glacier instant can be substantially cheaper, without impacting retrieval time [1]

At scale when costs matter, you would have lifecycle policy tuned to your needs taking advantage of these classes. Any typical production workload is hardly paying only S3 base price for all/most of its storage needs, they will have mix of all these too.

[1] if there is substantial data in glacier regular, the costing completely blows through the roof, retrieval +egress makes it infeasible unless you activily hate AWS enough to spend that kind of money

EnlightenedBro · on Dec 11, 2021

Lesson to build your services with Docker and Terraform. In this setup you can spin up a working clone of a decently sized stack in a different cloud provider in under an hour.

Don't lock yourself in.

manquer · on Dec 11, 2021

If the setup is that portable you probably don't need the AWS at all in the first place.

If your use only services built and managed by your docker images why use the cloud in the first place ? It would be cheaper to host on a smaller vendor , the reliability is not substantially better with big cloud than tier two vendors, that difference between say OVH and AWS is not that valuable to most applications to be worth the premium.

In IMO, if you don't leverage cloud native services offered by GCP or AWS then cloud is not adding much value to your stack.

rorymalcolm · on Dec 11, 2021

This is just not true for Terraform at all, they do not aim to be multi cloud and it is a much more usable product because of it. Resource parameters do not swap out directly across providers (rightly so, the abstractions they choose are different!).

lysium · on Dec 11, 2021

...if you don't have much data, that is. Otherwise, you'll have huge egress costs.

daguava · on Dec 11, 2021

You've written up my thoughts better than I can express them myself - I think what people get really stuck on when something like this happens is the 'can I solve this myself?' aspect.

A wait for X provider to fix it for you situation is infinitely more stressful than an 'I have played myself, I will now take action' situation.

Situations out of your (immediate) resolution control feel infinitely worse, even if the customer impact in practice of your fault vs cloud fault is the same.

dalyons · on Dec 11, 2021

For me it’s the opposite… aws outages are much less stressful than my own because I know there’s nothing I/we can do about it, they have smart people working on it, and it will be fixed when it’s fixed

electroly · on Dec 11, 2021

I couldn't possibly disagree more strongly with this. I used to drive frantically to the office to work on servers in emergency situations, and if our small team couldn't solve it, there was nobody else to help us. The weight of the outage was entirely on our shoulders. Now I relax and refresh a status page.

speedgoose · on Dec 11, 2021

> Third, you’re gonna go down when the cloud goes down.

Not necessarily. You just need to not be stuck with a single cloud provider. The likelihood of more than one availability zone going down on a single cloud provider is not that low in practice. Especially when the problem is a software bug.

The likelihood of AWS, Azure, and OVH going down at the same time is low. So if you need to stay online if AWS fail, don't put all your eggs in the AWS basket.

That means not using proprietary cloud solutions from a single cloud provider, it has a cost so it's not always worth it.

chii · on Dec 11, 2021

> using proprietary cloud solutions from a single cloud provider, it has a cost so it's not always worth it.

but perhaps some software design choices could be made to alleviate these costs. For example, you could have a read-only replica on azure or whatever backup cloud provider, and design your software interfaces to allow the use of such read only replicas - at least you'd be degraded rather than unavailable. Ditto with web servers etc.

This has a cost, but it's lower than entirely replicating all of the proprietary features in a different cloud.

bradknowles · on Dec 12, 2021

Complex systems are expensive to operate, in many ways.

The more complexity you build into your own systems on top of the providers you depend on, the more likely you are to shoot yourself in the foot when you run into complexity issues that you’ve never seen before.

And the times that is most likely to happen is when one of your complex service providers goes down.

If the kind of thing you’re talking about could be feasibly done, then Netflix would have already done it. The fact that Netflix hasn’t solved this problem is a strong indicator that piling more proprietary complexity on top of all the vendor complexity you inherit from using a given service, well that’s a really hard problem in and of itself.

bombcar · on Dec 11, 2021

True multi-cloud redundancy is hard to test - because it’s everything from DNS on up and it’s hard to ask AWS to go offline so you can verify Azure picks up the slack.

wjossey · on Dec 12, 2021

I deeply concur with this statement. I think folks here are conflating a one off test versus keeping your redundancy up to date as apps evolve.

kqr · on Dec 11, 2021

Sure you can. Firewall AWS off from whatever machine does the health checks in the redundancy implementation.

pixl97 · on Dec 11, 2021

What happens when your health check system fails?

speedgoose · on Dec 11, 2021

It's true, but you can do load balancing at the DNS level.

darkwater · on Dec 11, 2021

And you will get 1/N of requests timing or erroring out, and in the meanwhile paying 2x or 3x the costs. So, it might be worth in some cases but you need to evaluate it very, very well.

qwertyuiop_ · on Dec 11, 2021

Or rent bare metal servers like old times and be responsible for your own s*t

aenis · on Dec 11, 2021

Still plenty of networking issues that can knock you down hard.

tuldia · on Dec 11, 2021

... and be responsible for your own s*t

Don't miss the point of being able to do something about it instead of multi hours outage and being in the dark regarding what is going on.

xyst · on Dec 11, 2021

> And, finally, hold cloud providers accountable. If they’re unstable and not providing service you expect, leave. We’ve got tons of great options these days, especially if you don’t care about proprietary solutions.

Easy to say, but difficult to do in practice (leaving a cloud provider)

wjossey · on Dec 12, 2021

Absolutely hard. But that doesn’t mean if you’re in a position to start a company from scratch that you can’t walk away. Or if you go to another company and are involved in their procurement of a new purchase, that you can’t sway it away from said provider.

Just because it takes years doesn’t meant it can’t happen.

oasisbob · on Dec 11, 2021

> Third, you’re gonna go down when the cloud goes down. Not much use getting overly bent out of shape.

Ugh. I have a hard time with this one. Back in the day, EBS had some really awful failures and degradations. Building a greenfield stack that specifically avoided EBS and stayed up when everyone else was down during another mass EBS failure felt marvelous. It was an obvious avoidable hazard.

It doesn't mean "avoid EBS" is good advice for the decade to follow, but accepting failure fatalistically doesn't feel right either.

wjossey · on Dec 12, 2021

I hear you. I didn’t use EBS for five years after the great outage in, what was it, 2011?

At this point, it’s reliable enough that even if it were to go down, it’s more safe than not using it. I’d put EBS in the pantheon of “core” services I never mind using these days.

oasisbob · on Dec 12, 2021

Yup, 2011. That's the one. One of those US presidential campaigns stayed up throughout because of EBS-phobia.

Geez. We have decades-old cloud war stories now? I suddenly feel really old.

kortilla · on Dec 11, 2021

> Safety of your customer’s data is your ethical responsibility. Protect it, back it up, keep it as generally available as is reasonable.

> Third, you’re gonna go down when the cloud goes down. Not much use getting overly bent out of shape.

“Whoops, our provider is down, sorry!” is not taking responsibility with customer data at all.

wjossey · on Dec 12, 2021

Respectfully disagree. No company in the world has 100% uptime. Whether it’s your server rack or their server rack going down means nothing to a customer.

We’re not discussing data loss in this thread specifically. This is about a couple of hours of downtime per year.

chrisweekly · on Dec 11, 2021

Hey Wes! I upvoted your comment before I noticed your handle. +1 insightful, as usual

mobutu · on Dec 11, 2021

Brown nose

chrisweekly · on Dec 12, 2021

Troll