Hacker News new | past | comments | ask | show | jobs | submit login

> My business is such that external events cause immediate spikes. My traffic might double because Apple released a new firmware, or might go up 10x because someone released a jailbreak without warning me. That capacity requirement quickly trickles down and settles to its original levels over the next two months until it spikes again.

Have you actually had this occur in real life, that you had to spin up new instances during these spikes ? What kind of database configuration were you using such that it could accommodate all those new application server instances, do you also add new database slaves on the fly ?

When this article got at the idea of "sounds good in practice, but never happens in reality", that was my experience too. We were on Postgresql and the notion that we'd just "add 20 instances" when we had a load spike was ridiculous. I'm just curious who is actually doing this, and if they are also using relational databases.




Here is a graph I generated a few weeks ago: we've since had yet another major traffic spike due to the release of Absinthe 2.0 with Rocky Racoon (an untethered jailbreak for iOS 5.1.1) that is actually one of the most intense spikes yet (but am on my iPhone and can't make new graphs).

http://test.saurik.com/hackernews/absinthe.png

I over-allocate the database server for Cydia, but spawn up new web servers on demand. I keep as much of the CPU-intensive work then off the database, store as many static assets as I can on services such as S3, and use distributed queued logging (RELP).

For JailbreakQA's database (where downtime isn't that important) I do an instance stop, change the type of computer it is running on (such as from m1.large to c1.xlarge), start it again, and have a drastically different machine with only a minute of downtime. EC2 is a godsend (for me).


It's significantly more difficult to scale a traditional relational database (although not impossible!), than to scale the web/app layer that sits in front of it. Snapshot + clone + some kind of sync middleware (like pgpool for postgres) can probably get you 80-90% of the way there. Rearchitecting so that your db server is not the bottleneck should help there as well.

Maybe you need to have a master/slave setup, and on huge load, flip the slave instance over to be a instance type with quadruple the RAM and CPUs for a few hours, then back to a single-core, low-memory instance to keep the data-sycn flowing. There's a million ways to skin this cat.

If your database itself is the bottleneck, then, yeah, on the fly flexibility might be difficult to achieve.

In his case, a relational database probably isn't the bottleneck at all, and scaling out caches, web front ends, etc. is all fairly straight forward. There are huge numbers of folks taking advantage of this kind of flexibility.

Hell, Amazon has a whole API you can integrate with that handles it for you (even has $ references, so you don't accidentally spend yourself bankrupt because of a TC story).


My company provides dynamic content in emails, and as such gets large traffic spikes when 10 million emails get sent at once and everyone begins opening them. The content's configuration (in postgres) is trivially cacheable, but our app servers render different content based on the user's context.

So we have a bunch of shared-nothing app servers that we can spin up and down based on the emails we know are going out. Automatically detecting spikes and spinning up new instances between the send and the peak is much harder, though.


Sounds fascinating! Do you use centralized logging? If so how do you manage that?


Yeah, we're using Cassandra for logging. Not quite as simple to scale up, but it's write-only in the request cycle and hasn't been anywhere near a bottleneck yet.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: