Show HN: Dynosaur, An autoscaler for Heroku using Google Analytics Live

oceanplexian · on Jan 22, 2014

I'm curious why people put so much effort into scaling Heroku when you could outperform the maximum number of Dynos with only a handful of AWS instances for a fraction the price. It doesn't make sense. We're literally working around a problem that shouldn't exist.

werkshy · on Jan 22, 2014

We certainly don't use Heroku for the price. I've got a fair amount of experience running distributed apps on AWS and I have found that Heroku do provide a lot of value-added services on top (from deployment to database provisioning). For us, the best reason to use Heroku is that when AWS has a problem, Heroku can actually do something about it. Try calling AWS support when a whole availability zone just went down!

Of course, we are a Rails shop who have been on Heroku since we launched, so Heroku is really tailor made for us. If you're happy on raw AWS, good for you. Every hour I'm not thinking about AWS is an hour I can be writing a feature for our 10-month-old startup. The extra cost of Heroku is negligible when viewed in that light.

alexdean · on Jan 22, 2014

This is exactly what I'm wondering too. It's ingenious, but isn't Heroku + GA Live + Dynosaur both more complex and more expensive than just using AWS Elastic Beanstalk, which has Auto Scaling? What am I missing?

werkshy · on Jan 22, 2014

Dynosaur author here, if anyone has questions.

lowglow · on Jan 22, 2014

Dude! This is so boss. I've been looking for exactly this for Techendo - as you can guess our traffic varies by time of day. Being able to ramp services up as I need them would help tremendously.

Congrats on the launch and thanks for helping!

dylandrop · on Jan 22, 2014

Have you done any comparisons/benchmarks vs Adept Scale, Bounscale, or HireFire? Also why Google Analytics default as opposed to some other service?

werkshy · on Jan 22, 2014

We haven't benchmarked against other autoscalers or done any direct comparisons.

Our main motivation was to write something that scaled as we scale manually (i.e. when we get a lot of traffic due to press hits etc, we scale up based on GA realtime data as well as New Relic response times). When we were given access to the GA live API it just seemed like a natural fit.

noodle · on Jan 22, 2014

Just speaking off the top of my head here, but services like HireFire ping the Heroku API to poll for information on when to scale. The Heroku API is a little rate limited. You'll probably get a more responsive result using GA.

spydertennis · on Jan 22, 2014

HireFire uses New Relic. I would think that's more responsive than GA.

sjtgraham · on Jan 22, 2014

I have a similar (not auto-scaling) project also named Dynosaur from over two years ago. https://github.com/stevegraham/dynosaur

werkshy · on Jan 22, 2014

Woah! So sorry to be treading on your namespace! I did a google search for the 'Dynosaur' and didn't get any hits if I recall. We will put our heads together and come up for a new name for our project later today.

No offense intended: you picked a cool name!

BlackDeath3 · on Jan 22, 2014

Dynosoar?

anarchitect · on Jan 22, 2014

Looks great. Wouldn't work well for us though because our main enemy is over-aggressive spiders that don't trigger Analytics trackers.

thruflo · on Jan 22, 2014

Same might apply to an overloaded api / anything not serving html pages to browsers.

Heroku provides log-runtime-metrics, which includes current cpu load / pending cpu tasks. The librato addon also shows request throughput in its dashboard UI. I'm not sure where it gets this data from. If not logging then perhaps new relic?

spydertennis · on Jan 22, 2014

Doesn't it make more sense to scale based on New Relic response times? Why are you guys using # of users? Depending on the page they are requesting and how the requests are clustered that could produce vastly inferior results.

werkshy · on Jan 22, 2014

We find that neither New Relic nor Analytics gives the full picture: some of our pages are heavily cached, others (e.g checkout processing) are computationally expensive, database heavy and communicate with other systems (e.g payment processors) that can be a big bottleneck. Both New Relic and GA tend to just average those together (although with GA you can create new views that focus on specific pages). You are right that 'number of visitors on site' does not reflect our site performance in every respect.

We first conceived of Dynosaur as a plugin-based autoscaler (with GA and New Relic plugins to start with), but we've found the times we really need to scale fast are the times we have a lot of traffic generated from press stories etc (like this from today, if you will excuse the shameless plug: http://dealbook.nytimes.com/2014/01/21/a-start-up-run-by-fri...) and using the analytics live API allows us to react a little quicker than if we waited for New Relic to tell us our response times are getting slow. So far, we're happy enough with just a Google Analytics plugin.

One possible improvement would be to scale differently based on different traffic / performance metrics across the site. I think New Relic or other performance instrumentation would be very useful for that.

zer0defex · on Jan 23, 2014

Nice work, looks pretty solid. If you want to offload the responsibility of determining which GA metric events signify a potential spike, you could abstract it out and instead make the plugin use GA intelligence event alerts setup by your analytics team. This would help keep the respective subject-matter experts in their realms of expertise ideally allowing for a more on-going tailored approach to what, where, and how can trigger scaling fluctuations and the dev team isn't responsible for on-going management of the scaling trigger rules (well, to a certain extent).

Just a thought. I know you abstracted it out the way you did so as not to tie it to just GA, but if GA is your analytics platform of record, it could be worth pursuing. Cheers!

guidedlight · on Jan 22, 2014

Performance engineer here.

Scaling should occur on actual traffic/user throughput statistics not on response time. Response times can increase for a variety of reasons, of which only one of is increased traffic load. For example, response times can increase based on back-end database contention or thread contention.

Of course whichever configuration is chosen, it is important to load test your application so you understand its performance profile and that all contention points are understood.

willcodeforfoo · on Jan 22, 2014

It's kind of surprising that Heroku hasn't cracked this nut on their own. Perhaps it isn't in their best interest to scale down? Scalability certainly is often touted as one of the benefits of "the cloud".

geemus · on Jan 23, 2014

Unfortunately auto-scaling is incredibly application specific. It would be a great feature, but it seems likely to be quite difficult to make a particular solution that is useful to more than a small subset of users. You can get some sense of this from the other comments above (ie how analytics wouldn't be a good scaling metric for them).

sergiotapia · on Jan 22, 2014

So I'm curious, how many users per dyno are needed for a typical CRUD rails app?

werkshy · on Jan 22, 2014

That really depends on your app and your caching strategy. If you are doing something complex or memory intensive (or your app is poorly optimized), or if your users are highly engaged, it could be in the low tens of users! If you are doing a lot of caching you should be able to serve static pages on the order of several hundred or more per dyno.

hayksaakian · on Jan 23, 2014

You can get a lot out of 1 dyno.

On two dynos you can have 1 do background jobs and 1 server content

obviously it varies, but you don't inherently need many.