The Cloud Is Not for You

mechanical_fish · on June 2, 2012

This article would be better if it were properly titled: "The Cloud is Not For Me".

The anecdote provided here – about a specific app with a specific architecture and a specific load profile, deployed by specific employees with specific skills at a specific company with a specific budget, a specific revenue model, and a specific schedule – is fine as anecdotes go, but you can't draw a general conclusion from it. Change any of those specifics and the conclusion may be entirely different.

zeeg · on June 2, 2012

Some of these may be specifics, but budget, revenue, load profile are not.

Dont have a budget? Good luck with that. Dont have revenue? See above. No load profile? Doubt you're going to end up with revenue.

My skill set may be more wide reaching than others, but I'm confident that any of the engineer at our company (and most others) would be able to do exactly the same thing.

It's not about fitting it within the constraints of what you have, it's about doing something simply because it's a better choice. Why spend more time and money if you dont have to? I may have spent time making the switch, but I'll save that by not having to worry about performance and scaling concerns for the foreseeable future.

saurik · on June 2, 2012

My business is such that external events cause immediate spikes. My traffic might double because Apple released a new firmware, or might go up 10x because someone released a jailbreak without warning me. That capacity requirement quickly trickles down and settles to its original levels over the next two months until it spikes again.

Netflix is in a similar position: you never know when A&E runs a new documentary making everyone suddenly want to watch this one specific old movie at the same time, or when a trailer for a sequel hits during the Super Bowl and when the game is over everyone is inspired to watch the original.

Some businesses thrive on environments where capacity planning 24 hours in advance is impossible, an where holding excess capacity seven days later might mean 10x the costs of running your business. Claiming that these people are incompetent or couldn't possibly have profitable businesses is ludicrous: the cloud isn't for you, but I /love it/.

Btw, the extreme irony is that your entire business model works because your argument is often false. Most of the small blogs I know use DISQUS not because it does something cool, but because they only get traffic randomly based on events they have little control over: Apple announces something, to take advantage of it they write an article about it to be published an hour later, and then their site which normally gets no traffic suddenly needs to handle insane load.

Your cloud-hosted (to be clear, not necessarily that you host it in the cloud, but that you are a cloud to your users) comment engine solves that problem (as well as adding similar deployment advantages and simplicity that is also known from cloud hosting).

(Note: for those who want to heckle me "your service falls over when jailbreaks come out", it doesn't: the third party repositories do. The core site, the payment processing system, and my repository barely notce, and when they do it is normally due to a bug that I rapidly notice and fix, and tends to have limited effect.)

zzzeek · on June 3, 2012

> My business is such that external events cause immediate spikes. My traffic might double because Apple released a new firmware, or might go up 10x because someone released a jailbreak without warning me. That capacity requirement quickly trickles down and settles to its original levels over the next two months until it spikes again.

Have you actually had this occur in real life, that you had to spin up new instances during these spikes ? What kind of database configuration were you using such that it could accommodate all those new application server instances, do you also add new database slaves on the fly ?

When this article got at the idea of "sounds good in practice, but never happens in reality", that was my experience too. We were on Postgresql and the notion that we'd just "add 20 instances" when we had a load spike was ridiculous. I'm just curious who is actually doing this, and if they are also using relational databases.

saurik · on June 3, 2012

Here is a graph I generated a few weeks ago: we've since had yet another major traffic spike due to the release of Absinthe 2.0 with Rocky Racoon (an untethered jailbreak for iOS 5.1.1) that is actually one of the most intense spikes yet (but am on my iPhone and can't make new graphs).

http://test.saurik.com/hackernews/absinthe.png

I over-allocate the database server for Cydia, but spawn up new web servers on demand. I keep as much of the CPU-intensive work then off the database, store as many static assets as I can on services such as S3, and use distributed queued logging (RELP).

For JailbreakQA's database (where downtime isn't that important) I do an instance stop, change the type of computer it is running on (such as from m1.large to c1.xlarge), start it again, and have a drastically different machine with only a minute of downtime. EC2 is a godsend (for me).

count · on June 3, 2012

It's significantly more difficult to scale a traditional relational database (although not impossible!), than to scale the web/app layer that sits in front of it. Snapshot + clone + some kind of sync middleware (like pgpool for postgres) can probably get you 80-90% of the way there. Rearchitecting so that your db server is not the bottleneck should help there as well.

Maybe you need to have a master/slave setup, and on huge load, flip the slave instance over to be a instance type with quadruple the RAM and CPUs for a few hours, then back to a single-core, low-memory instance to keep the data-sycn flowing. There's a million ways to skin this cat.

If your database itself is the bottleneck, then, yeah, on the fly flexibility might be difficult to achieve.

In his case, a relational database probably isn't the bottleneck at all, and scaling out caches, web front ends, etc. is all fairly straight forward. There are huge numbers of folks taking advantage of this kind of flexibility.

Hell, Amazon has a whole API you can integrate with that handles it for you (even has $ references, so you don't accidentally spend yourself bankrupt because of a TC story).

mnutt · on June 3, 2012

My company provides dynamic content in emails, and as such gets large traffic spikes when 10 million emails get sent at once and everyone begins opening them. The content's configuration (in postgres) is trivially cacheable, but our app servers render different content based on the user's context.

So we have a bunch of shared-nothing app servers that we can spin up and down based on the emails we know are going out. Automatically detecting spikes and spinning up new instances between the send and the peak is much harder, though.

dpritchett · on June 3, 2012

Sounds fascinating! Do you use centralized logging? If so how do you manage that?

mnutt · on June 4, 2012

Yeah, we're using Cassandra for logging. Not quite as simple to scale up, but it's write-only in the request cycle and hasn't been anywhere near a bottleneck yet.

zeeg · on June 2, 2012

I'd argue that most people use Disqus because its more powerful than whatever they had (or didnt have) and it's so easy to setup.

I should redirect the point of my post to targetted more towards small businesses, startups, random hackers, etc.. Obviously if you're a larger company you can do whatever you want, and it will generally work out.

That said, just because "Netflix uses the cloud" doesnt mean it's the right decision for anyone, and they certainly do not use it to handle spikes in traffic. They've publicly stated that their primary reason for AWS/etc was simple the lack of operational complexity that was needed to manage it. Even given that argument, the cloud at a point becomes just as complex, if not more, than just running servers.

One thing that's helped Disqus out is the fact that we can get ridiculously powerful servers to ease the burden of needing to scale out (horizontally) right away. I believe even our smallest database servers are still the maximum size you can run on AWS (in terms of memory).

luser001 · on June 2, 2012

AFAIK, Netflix doesn't use Amazon for delivering streams to their customer (they use Level3's CDN). They use Amazon for batch jobs like transcoding and to host their API servers.

Using Amazon for API servers is a little surprising, btw. I would have gone with dedicated hardware. Maybe the sheer number of machines for API service is so high that using AWS APIs is a significant saving in operations complexity?

count · on June 3, 2012

Netflix uses more than one CDN, and, according to posts by their one of their architects (Adrian Cockroft http://perfcap.blogspot.com/), they use EC2 for everything but some basic in-house HR/backoffice stuff.

He's got DOZENS of posts on how it works for them, how they full embraced ec2, etc. and how well it's working for them.

eranki · on June 2, 2012

This article is patently false and annoyingly severe. I used to run the backend of Dropbox and I can say without uncertainty that the cloud (AWS in our case) is incredibly useful. I would go so far as to say that the cloud IS for you, and probably for everyone until it really makes sense to move off. You may have had a bad experience with an off-the-shelf scaling solution, but that does not implicate "the cloud."

- Sometimes you just don't know whether your traffic is going to spike 10% or 100%. And I'm not talking about one or two computers, I'm talking about adding hundreds. Are you sure your supplier has everything in stock for a rushed overnight order? Your exact hard drive model? Your aging CPU? You're seriously going to experiment with a new batch of shit and tell your team to spend all night wiring all of those racks? Do you even have the space left in your cage? Enough space on your routers? Enough power? Even if you're in managed hosting like SoftLayer, these are now their issues and Not Their Problem if they can't turn around for you in the time you need.

- In the time we were on the cloud we were better able to understand our hardware needs so that we could actually spec out machines optimally. Even better, technology improved considerably to bring us low-cost SSDs which wouldn't have been possible at the start.

- There was no way we could manage these servers and a datacenter without a dedicated network engineer and an SRE. And even that was pushing it. If you've ever tried to hire these positions, it's even harder to find good ones than software engineering. We got really lucky. Also, you've spent a lot of time on your engineering interview process and you have it down -- now do the same for two more positions that you know much less about.

- There is a huge engineering cost to moving off and building your own tools. Two servers? OK. Two thousand? Different ball game.

- I would argue that even a company like Google uses essentially a cloud solution that they've built internally and made available to their teams. AWS helps make a piece of that accessible for the rest of us.

TLDR: I thought I was hot shit too when I ran a Newegg server off my parents' internet connection, but come back when you're pulling down a megawatt and tell me the cloud sucks.

zeeg · on June 3, 2012

Are you sure your cloud supplier has everything in stock to add 100s, or 1000s of servers?

I'm not sure how many servers Dropbox is up to now, but we've got over 200 physical machines (probably nearing 300, growing quickly), and much more when you break them down to how many virtual services there are running. We do that with 3 operations people.

The biggest advantage for us has been the ability to scale up, rather than worrying about how to scale out right now. For a startup, I'd say this is even more important. Horizontal scaling is one of the most difficult parts of my job. That's not saying that it's hard, but the fact is its straight up not as easy as just buying better hardware.

pyre · on June 3, 2012

  > 1000s of servers

When was the last time someone needed to add multiple 1000s of servers on-the-fly (i.e. 5 minutes ago)? Am I just naive to think that this would be a huge spike? If you're talking about getting 1000s of servers over a longer time frame, I'm pretty sure that AWS would be able to keep up.

sabat · on June 3, 2012

AWS? Surely you just. Yes, AWS has thousands of servers ready to go at any time.

rmoriz · on June 3, 2012

Who pays for this massive amount of unused hardware? Is there a guarantee that you can access this hardware anytime in this quantity?

kenneth_reitz · on June 3, 2012

Amazon leases the unused space as "spot instances". You bid on the available space, and if capacity increases, your box goes away.

Wilya · on June 2, 2012

Two thousand servers is nowhere near the scale of the average person wondering if they have to go in the cloud. And once people have reached that size, I hope they don't follow random blog advices.

Before that, for anything that fits on less than, say, 20 servers, you're way better off renting servers.

eranki · on June 3, 2012

I don't know about way better. We had the experience of being both in managed hosting and the cloud from 1 server each to hundreds, and the cloud was better the entire way, with comparable costs. Especially if you're going to include engineering cost and you're talking about not many servers, any few hundred dollar/mo premium on AWS pays for itself. AWS reliability has also improved considerably in the past few years while managed hosting has not.

Besides, this article is about scaling. If your needs are static, who cares what you use. It's about where you go from there, and I'd rather be on the cloud before I have to.

Management tools have different needs from 2 servers to 10 to 50.

MattRogish · on June 2, 2012

I think Heroku is a great tradeoff for brand-new startups that are cash-strapped and also have no sysadmin capabilities.

Buying your own hardware - or using a blank VM provided by a VPS company - and setting them up immediately puts you in sysadmin mode from day one. New apache vulnerability? Your problem. Operating system upgrade? That's you, too.

What about security? Do you know all the ways that bad people can get into your system? How do you lock it down? Do you know how to properly configure the firewall?

What about a hardware failure? Now you're running back and forth from Fry's to your datacenter. Or in a VPS - you're going to have downtime while your provider moves your VM to another box. Depending on the scale of the outage, you could be down for a long time. Ugh.

Yes, Heroku is more expensive than your own hardware. So are tons of other VPS/VM hosting facilities out there.

Personally, I also really like RailsMachine. They built something called Moonshine (https://github.com/railsmachine/moonshine) that uses Puppet and Ubuntu to spin up properly configured, locked-down servers at the drop of a hat. I'd try something like that before managing my own infrastructure as an early-stage startup.

Some things are better outsourced; as a startup your most valuable currency is not a few hundred dollars per month - but your time. Wasting it playing sysadmin isn't worth the cost.

DenisM · on June 2, 2012

You're making VPS look harder than it needs to be. Amazon EC2 has firewall that's a snap to configure, spinning instances up and down is a few mouse clicks, updates can be set up automatically (at least for windows), and a filed instances (which I've never seen) means you just need to spin up a new machine from an image and deploy your latest software again.

The only admin thing you need to do is backups (I am still upset that EC2 EBS does not provide timer-based snapshots).

MattRogish · on June 2, 2012

Yeah, sure, for you that's easy. Clearly you've done it before.

What if you've never seen the AWS admin panel? Is it really that easy to pull a well-architected AWS setup out of thin air? Maybe, but from playing around with it (I've used S3 a lot and there are plenty of gotchas) there is a lot of nuance I'm missing without experience. (see all the "oops we did it wrong" posts on HN).

I've been in software and systems architecture a long time and I probably wouldn't be very confident I got it right on my first try. Perhaps I'm overestimating the domain knowledge needed.

I'm not at all suggesting Heroku > AWS, but in my experience an early stage startup with only developers shouldn't play sysadmin. It only leads to pain and suffering later. And, time spent mucking around with AWS should be better spent figuring out product/market fit.

DenisM · on June 3, 2012

I think AWS is easy, but you're right in that there are so many features you might know where to look.

I could probably write a tutorial and exercises that will cover all of the requisite things for Windows:

  1) Finding an image (also - what is an image?) 
  2) Launching a new instance from an image 
  3) Connecting to instance using RDP
  4) Basic monitoring
     (on the level of - are the lights blinking?)
  5) Finding and pressing the reset button
  6) Creating a new image, to be used later
  7) Setting up auto update
  8) Configuring the firewall

This will take an hour to read and do the exercises, after each anyone can run their stuff in the cloud. I'm not talking about complex system, but a few web sites and a database is easy enough to set up.

Firewall can be buttoned down to only allow ports 80 and 443 (or just 443) to public, and RDP port to your own IP address. If you're using Microsoft stack, they will keep you fully up to date. You would still have to update your own software, but presumably people can handle that.

The only thing is automated backup - you'd have to write a script for that and schedule a cron (schtasks) job.

zeeg · on June 3, 2012

I'm definitely not an expert on AWS, but this almost seems like the same process I had to go through with setting up my servers.

I more or less knew how to structure the app, what software to use. So it came down to:

- Setting up base configuration (users, iptables, etc)

- Setting up web/db specifics

- Setting up automated backups

gojomo · on June 3, 2012

Happy user of both Heroku and (recently) GetSentry here.

I think the author may be undervaluing how useful it was to, "for the first couple of months", survive on the off-the-shelf offerings. He didn't have to spend 3 days learning server configurations. He could instead expand his service "to cover nearly all popular languages".

He only had to switch -- spending effort on researching alternatives and learning new systems -- after knowing the growth path of his business, the pain-points of his architecture, and the baseline costs of a cloud solution. Heroku's turnkey environment, warts and all, got him to the point where he knew enough about his business situation to make rational optimization changes.

And if the 'shape' of his architecture, growth, and personal expertise was slightly different, he might be just adding dynos and upgrades inside the Heroku system today, happy to be paying them far more than his peak $700 cloud bill.

jkahn · on June 2, 2012

The problem is not that the cloud is not for you / others.

The problem is that you chose the wrong kind of cloud platform for your app, and instead of choosing the right kind, went right back to the early 2000's and rented some physical boxes instead. In doing so, you've flipped your pros and cons lists, but you haven't solved the actual problems.

Heroku is a PaaS cloud platform. Zero server awareness or maintenance. An app only environment.

You would have been better off using an IaaS cloud, where you have to rent instances and install apps, but you still have the benefit of instant, (potentially automated) scalability, intrinsic high availability and minimal maintenance.

Instead, you're renting physical servers which often have a long term commitment, can potentially drop off the air for any reason - such as drive failure, power supply failure - and then you'd have to rebuild.

I think the cloud almost certainly is for everyone, but you need to use the right kind of tool for the job.

zeeg · on June 2, 2012

This is true, but Heroku is not the only provider of services like this, and some of the same limitations and costs exist throughout the ecosystem.

I would disagree that renting physical machines is going back to the 2000s considering nearly every large company is doing this.

Just because the trendy hipsters of the internet use the cloud, doesnt mean its modernizing things, and anyone not doing it is a dinosaur.

jkahn · on June 3, 2012

Google App Engine and others also offer similar PaaS offerings. Yes, they all have similar limitations (and benefits).

Large companies I deal with don't rent physical machines, they buy them and then run a hypervisor platform across them to gain some of the availability benefits of an infrastructure cloud. It costs more, but you lower your risk profile. Many of these companies are looking at migrating their apps to a cloud platform (internal or external) to outsource the hardware / software / infrastructure maintenance. They are mostly interested in IaaS.

Cloud is not just for Internet hipsters.

dasil003 · on June 3, 2012

This article is painting with far too large a brush. Heroku is not The Cloud, it's just a little piece of it.

When you use Heroku you are paying a premium for the platform. You do this because it saves you sysadmin duties, and you pay a premium in terms of base cost as well as flexibility. Obviously it only makes sense if the platform suits your app.

If you want to manage your servers yourself you will save a ton of money and gain a ton of flexibility. But does this mean you should reject the cloud and order some servers and start shopping for colocation facilities right away? No, you still have dozens of competing choices for how to get your vanilla linux boxen up and running, some of them fall under the purview of "the cloud" and some of them don't. These choices become very interesting based on your specific requirements. Sure raw EC2 is going to charge more than buying your own server, but if you buy reserved instances and factor in the colocation costs it's actually not that much more, and you may have many other reasons to look at it. For instance maybe you push a lot of data to and from S3 and you want it to be fast, or maybe you want to easily put identical deployments in Europe, Asia and South America for your global customers.

Beyond that, the premise that the cloud is somehow a big pain in the ass and owning your own hardware does not really wash. They both can utilize chef/puppet equally well, and being able to programmatically commission hardware is a huge convenience, the learning curve notwithstanding.

I also don't like hosting substantial apps of any kind on Heroku for some of the reasons the OA touches on, but that is not an indictment of cloud hosting in general. This article jumps to conclusions far too hastily and then arrogantly proclaims their generality with the threadbarest of anecdotes.

Jare · on June 2, 2012

Wow where to begin. Yeah with the title hehe: 'You' in a title obviously means 'Me'.

> Almost any company worth a damn can bring online a server within 24 hours

I know a few companies 'worth a damn lot more than a damn' that can't bring a physical server up within 24 hours unless someone happens to have a spare box with the right specs lying around (unlikely). If they need multiple boxes, well, good luck with that.

But perhaps more importantly, I know of no companies that can get rid of a now unnecessary server quickly and at no cost.

The idea that 'capacity planning' is the best approach to scaling in 99% of the cases is laughable.

Someone who can bring up, configure, secure and monitor a custom backend in a few days is someone versed in operations, even if his day job is development.

zeeg · on June 2, 2012

I don't even know where to begin, with responding to the level of ignorance you've displayed here.

Find me a company that isn't purely selling custom made-to-order servers that cant do next-day, let alone same-day turnaround on a new server.

Find me any reasonable tech company who doesnt use configuration management and have configure a server within a few hours of it being ready, let alone a few minutes.

My budget company has even delivered in the 24 hour window, and it takes me 30 minutes to have a server fully configured and ready to go into production. Did I mention enough how I'm not a systems guy and this is still without challenge?

I don't want to start bringing Disqus into this story, as this article is completely unrelated to how we deal with things, but we've never had an issue that we couldn't handle with proper implementation or planning ahead. We also use entirely physical servers for our day-to-day operations.

Jare · on June 2, 2012

What do you want, a pat in the back? It's awesome that you and your company are doing all those things and run extremely well, and the article is very interesting as a specific tale in the evolution of a particular service.

But then you start projecting all that into the rest of the world, with what appears to be very little actual insight into the rest of the world; showing a blanket dismissal of anyone who can't do the things you do; and refusing to acknowledge that many companies and products operate under very different parameters and priorities.

Thanks for the info and the data, and feel free to understand why I am ignoring the rest of the stuff.

illumin8 · on June 3, 2012

I hate to burst your bubble - I work for a Fortune 15 company and it takes us around 4 weeks to quote, purchase, and source a single new server. During the flooding in Thailand last winter which impacted hard drive inventories at our major server vendor, HP, it took us about 8-12 weeks to do the same. If you want a cluster or more than one server, we go through an architecture review process that might add another 4 weeks to the process.

Granted, if you just want a VM, and we have excess capacity, we can turn that around in about a week, but you have to understand, large companies have procedures that you probably never even thought of:

A new server has to be spec'd out; the requirements have to be gathered from the business owner of the server (OS version, CPU, memory, disk space, network and SAN connectivity). This is all put into a worksheet that is stored in a document management system. Quotes are obtained from a vendor. A design review meeting takes place to make sure no parts are missing from the quote (9 times out of 10 we need to get corrections on the quote because something as simple as a cable, KVM dongle, or spare power supply is missing). Now we go the the purchase approval process - much paperwork is filled out. Depending on the $ amount, between 10 and 20 people might need to electronically approve of the purchase. At any given time 10-20% of those approvers might be on PTO and you either have to wait for them to get back or contact their admin. assistant to approve for them.

With that all done and the purchase approved, now the PO is cut, wait 2-4 weeks for hardware to arrive (8-12 weeks if inventory is constrained due to circumstances out of your control like the Thailand flooding). More paperwork is done - the system is physically racked and an OS is loaded. There is a database where the server needs to get loaded into. Security hardening takes place, Nessus scans, and we ensure it is on the patch management schedule to receive patches. Configuration is done, and if all went well, we're now 4-6 weeks into the process and the server has been turned over to the application owner to install their application. After they install their application we take it back for a day or two to configure and test backups for them.

Your reply contains a great amount of naivety about the way most large companies work. What works for you, where you are developer and operations all in one person, doesn't work when you scale it out and have to work across many departments. We have Linux, Windows, VMware, Storage, Network, Security, Database Admin, and Application Support/Operations teams that all might have to touch a box before it is ready for the application to go live.

By the way, we're in the infancy of looking at cloud, but chances are we will have a private cloud long before we trust a public one - we can't risk our data leaving the data center.

moe · on June 2, 2012

Oh nice, another cloud thread.

I'll just utter my standard: The cloud is great for very small and very large deployments.

If you're in the middle, rent servers.

druiid · on June 3, 2012

Agreed. Too many mid-sized companies seem to think that the cloud is some magic bullet for them, but when it comes right down to it... if you have substantial but consistent traffic (which many mid-sized companies do) likely the cloud is not for you.

If you have a few dollars to throw at moving off shared web hosting, the cloud is likely for you and also if you are the next big thing on the internet, it might also be for you (until you run out of VC dollars and then have to figure out how to become profitable.. but that's a whole other story).

blantonl · on June 3, 2012

If you rent servers, you are in the cloud.

mryan · on June 3, 2012

The term "cloud" is nebulous at the best of times, but I think most people would disagree that "renting servers" == "cloud".

blantonl · on June 3, 2012

The term "Cloud" can meet a number of different functional definitions. In this article's case, this was a PAAS model where you upload your code, choose the supporting services, and expect a given service level in return. For some, this works great, for others - well, see the OP blog post.

In my organization's case, we run all of our infrastructure "in the cloud" - mostly AWS. But, we view AWS as an infrastructure cloud provider meaning we choose the hardware available, develop and run our own stack, and go from there. We also have infrastructure with ServerBeach and 100TB.com - and, yes, we consider that "cloud" as well. Just because your code doesn't run well on a PAAS cloud provider doesn't mean that my code won't prosper on a "cloud" infrastructure provider.

Here is the deal: if you've never seen, in person, the server that you are paying for - you are in the "the cloud." And that is going to be most of us out there. And for those of you that "own" your own boxes, that still doesn't mean you aren't "in the cloud" - because you probably setup your own internal cloud at that point.

Edit: SAAS to PAAS

jokull · on June 2, 2012

For small stuff Heroku is great. We're an ad agency doing a bunch of small sites and Facebook games with little traffic (small marketplace). The cost is spread out over many clients and it works out for everyone. That's the opposite of the scenario this article presents: one API with one database and one party that pays the bill. We have very low hardware requirements, but would like to manage as little as possible.

PHP/MySQL shared hosting was always the best way to get a site up quickly (cheap, reliably and fast). Now finally there is a comparable hosting option for simple stuff written in Python/Ruby/..?

warmwaffles · on June 2, 2012

The benefit of running on Heroku is time savings. Not scaling. If I hit a scaling problem, that's a good sign that I am growing. Right now, I have no scaling issues because we aren't big enough so the savings of hiring a dev ops or learning chef while I make my product better is where the plus is.

Yes you become reliant upon the cloud, but it is only temporary. Buying faster hardware isn't hard

zeeg · on June 2, 2012

This is entirely that false belief I was trying to point out here. I'm not an operations guy, I'm an engineer. I spent a few hours a day for a few days and came up with replayable, fully-configured systems. If this were a full-time gig, I'd have had that done in the first 24 hours.

It's not hard to configure services in this day and age. The internet is a wonderful resource for every problem you can ever imagine, and with things like Chef and Puppet becoming so mainstream, you can piggyback off of the work of many other, much more versed players in the game.

d3ad1ysp0rk · on June 2, 2012

Just to start some discussion on the dismissal of instant scalability: "Almost any company worth a damn can bring online a server within 24 hours, even budget companies. When have you actually needed turnaround time faster than that? If you did, maybe you should read up on capacity planning."

Realistically it is often very hard (impossible) to predict how viral something new is going to go. If some "superstar" grabs onto your app and publicizes it in front of 10x your normal audience, 24 hour turn around may not be fast enough.

zeeg · on June 2, 2012

That's a good point, and I think in those situations the cloud will fall over just as easily. A lot of that will come down to your actual applications architecture and how well it can scale.

mguterl · on June 2, 2012

The idea of instant scalability is somewhat flawed. James Golick does a pretty good job of explaining why here:

http://jamesgolick.com/2010/10/27/we-are-experiencing-too-mu...

true_religion · on June 3, 2012

It feels awfully ironic that a SaaS provider is telling me that the Cloud is not for You.

If it isn't, then why would I purchase a GoSentry licence instead of just self-hosting Sentry and avoiding the $99 fee

ilaksh · on June 2, 2012

How much does Heroku actually cost? I have the impression that its a ripoff if you need a few databases. Like you could get a Linode or Rackspace for $20 or $10 a month or you could get Heroku set up with a few "addons" and get jacked.

Even for Linode or Rackspace though, I think that there is still room for much better prices for larger instances though for sure.

Also, if you don't want to set up the VPS, Linode has StackScripts, or (shameless plug), if you are into Node.js, I am only charging about $1.85 more per month than Rackspace for a VPS with my image that already has MongoDB, nginx and redis setup. http://cure.willsave.me

rdegges · on June 2, 2012

So, I've actually had completely the opposite experience using Heroku.

I work at a telecom company that heavily relies on python (and Django) for our front-end and backend--and processes millions of API requests per month.

Initially--we started out running our entire web stack on our physically colocated hardware. This was a tremendous pain for us, for numerous reasons:

- Investing in hardware requires lots of time to shop around, purchase hardware, and do capacity planning.

- It requires physical work (in our case, multiple trips across the US) to setup / provision / bootstrap our servers.

- Automatic maintenance (through puppet, in our case) required constant meddling to handle edge cases, and robbed us of hundreds of man hours in the process.

- Regardless of how excellent your puppet (or chef scripts) are, you will unquestionably have to do a lot of maintenance work to keep things running smooth through OS updates, new software releases, etc.

- Scaling up on hardware is a pain for small companies, as it requires so much time and supervision.

I made the choice to migrate all our stuff to Heroku after an enormous comparison of services available (we actually migrated to Rackspace for a while before Heroku), and finding myself completely unsatisfied with options.

Moving to Heroku was an incredibly great decision for us:

- Our operations time went from hundreds of hours per month to none.

- With our extra engineering time freed up, we released numerous new features in the following two months and doubled our userbase.

- The cost for running our web infrastructure dropped by a significant portion, not only because of engineering time saved--but also because it was incredibly easy to scale up (and down) our entire infrastructure when needed, as opposed to wasting money on static physical resources that can't be adjusted for need.

- Our bandwidth cost dropped to 0 (bandwidth is provided free on Heroku).

- We don't have to run any duplicate services, since we can rely on Heroku's load balancer, dyno dispatcher, fully managed databases, and monitoring (via newrelic).

- We have automatic (free) database backups via Heroku's pgbackup service.

- We can instantly add read slaves, test changes on duplicate masters, etc. (all for free).

In regards to the 'poor performance' mention in this article--since there is no data or numbers listed, I can only assume that the author didn't have any monitoring software or management software setup. I've found that using newrelic (a Heroku addon service) gives you more than enough detail and granularity to see how your dynos perform, which tasks take up CPU, which tasks take up memory, and how to properly handle capacity planning.

All in all--I feel like this is a good topic for an article, but that it seems to be poorly researched. Sure, hardware can outperform a cloud service if you compare a virtual dyno to a physical server--but at what cost?

- High upfront purchasing cost.

- Physical maintenance required.

- Operations work required.

- Must implement your own redundancy of services (backup load balancer, backup database, backup caching servers, etc.)--these all add extra hardware costs.

- Custom monitoring solutions must be implemented.

- No automatic recovery from faults.

- Large time investment to keep things running.

I disagree with the premise that 'The cloud is not for you.' I'd actually argue that the opposite is true.

zeeg · on June 3, 2012

You point out some obvious benefits of Heroku, but you're failing to see some of the truths:

There's no high upfront cost for physical machines. I paid no setup fees. I signed no long-term agreements.

I'm still using "cloud" monitoring solutions. In fact I'm running Scoutapp and it's extremely useful. In addition, I can actually access the machines and diagnose behavior. This is only limited with PaaS like Heroku of course

Very little time has been invested, and I invested much more trying to fit my problem into Heroku solutions, rather than the other way around.

jsolson · on June 3, 2012

> There's no high upfront cost for physical machines. I paid no setup fees. I signed no long-term agreements.

Presumable someone spent some time getting those machines up and running and ready to run your software? Installing an OS, configuring things, etc.

astrofinch · on June 2, 2012

Can anyone recommend a primer on the sort of scalability issues the author mentions? Some textbook or series of blog posts I can read?

zeeg · on June 2, 2012

In my case I was quickly hitting limitations of Heroku, and the allocated resources. That said, you could hit the same (higher) limitations even if you're not using Heroku, and even if you're using physical servers.

My primary issues were:

- Lack of insight into CPU/IO/Memory usage

- Should have been able to handle much higher concurrency than I was

- Should have been able to perform much better than I was

olefoo · on June 3, 2012

The http://highscalability.com/ blog has a decent archive on the topic. Remember that your situation is unique and that the first rule of scaling is to measure everything you can. No two sites have the same audience, or requirements, or expectations. Things that work in one context will only create problems in another, and the context will change over time.

My own experiences tell me that the important things are:

1. understand http caching, and how to make it work for you.

2. query caching (whether against a database, a document store, search engine or what have you) buys you more than tweaking your web server.

3. your webserver configuration matters, but not as much as you'd think.

4. chunking a stream and sharding a datastore are two different views of the same problem.