Disagree with what? Keeping your own backups? Probably not - if you're not keepi...

cbsmith · on Nov 14, 2013

> Disagree with what?

I disagree with the notion that RDS removes remarkably little of the pain of running a database instance.

Yes, there are projects where RDS is not a great solution, but it definitely simplifies a lot of stuff. The notion that it "only removes up-front setup pain" is silly. If you manage your databases correctly, up-front setup pain should be the vast majority of all your basic admin operations. The "at the cost of ongoing maintenance" part is a real head scratcher for me. RDS basically gives you everything you'd have with a DB on an AWS instance except a local login, which one tries to avoid using like the plague anyway.

falcolas · on Nov 15, 2013

> The "at the cost of ongoing maintenance" part is a real head scratcher for me

Let's look at a common problem that DBAs are typically given: "The Database is slow!". Let's troubleshoot this ficticous problem on RDS:

Am I being affected by a noisy neighbor? Can't tell; contact Amazon support.

Can I look at top to see if the load is high on the box, and potentially why? No. I can look at historical trends, but not with enough granularity or information to be useful.

Can I look at the disk iops to see if there's any kind of problem there? No. Complete black box here; contact Amazon support.

Can I look at the slow log? Kind of. They'll push the slow log data into the database for you to query, but then you can't use tools to do aggregate tracking.

Pause for a moment for a quick MySQL RDS tip: pt-query-digest has a mode of operation that lets you do a processlist every 1/100th of a second and turn that into a pseudo slow log, which does work for RDS.

    pt-query-digest --processlist h=10.0.0.1 --interval=0.01 --output slowlog > /tmp/fake_slowlog.out

Back on track - so no real analysis of a historical slow log, without writing your own tools. Possible, but time consuming.

Can I kill queries? Yes, using a stored procedure. Can't use any of the existing toolset around this (like pt-kill, which can help keep poorly written ad-hoc queries from getting out of hand).

So, after many hours swapping emails with Amazon support, we've determined that we're actually spending a lot of time waiting on malloc mutexes. The internet says that using a non-default version of malloc will help with that - can I do that?

Nope. You're stuck.

Other things you can't do:

* Offsite backups that are in any form but MySQL dumps.

* Take advantage of new index types and compression support from TokuDB.

* Zero downtime failovers (We were able to help someone fake this; it was a PITA).

* Cross-region replication.

* Automated failovers using a reputable tool (MMM, MHA, etc).

* Access the error logs.

* Run multiple instances on one machine.

* Alter the disk elevator (hopefully they're using something sane, like noop, but we'll never know)

* Alter the kernel swappiness.

* Troubleshoot crashes.

* Monitor and alert on a machine's vitals.

Now perhaps I'm just being a power-hungry admin, but these small things matter. They are the difference between a snappy DB which scales beautifully to 10,000+ QPS, and a sluggish DB that causes you to move to bigger hardware, because it's the only option open to you.

Databases just aren't that hard to set up. Install packages, install config files, start the DB, restore from a backup file, restart the DB, and you're golden. If you're particularly paranoid, set up the selinux contexts (I'd bet dollars to doughnuts that this isn't done on RDS instances), and create a security group that limits access to only the 22 and 3306 ports to your application hosts, and set up individual users.

This is particularly simple when you use an orchestration tool; I recommend Ansible personally.

cbsmith · on Nov 15, 2013

> Am I being affected by a noisy neighbor? Can't tell; contact Amazon support.

Sure, you can. Spin up multiple RDS's and benchmark them.

> Can I look at top to see if the load is high on the box, and potentially why?

If you ar using top to monitor your box, you are already screwed. There is lots of support for remote monitoring.

> Can I look at the disk iops to see if there's any kind of problem there?

Disk iops are part of the built in monitoring and metrics provided with RDS.

> Can I look at the slow log? Kind of. They'll push the slow log data into the database for you to query, but then you can't use tools to do aggregate tracking.

If only there was a tool that could extract records from a database and compute aggregates...

> So, after many hours swapping emails with Amazon support, we've determined that we're actually spending a lot of time waiting on malloc mutexes. The internet says that using a non-default version of malloc will help with that - can I do that? >Nope. You're stuck.

MySQL sucks. RDS provides no means to make it any better. Fortunately they do now provide PostgreSQL.

> Offsite backups that are in any form but MySQL dumps.

You can do that by replicating to an external MySQL server and doing whatever the heck you want with it.

> * Take advantage of new index types and compression support from TokuDB.

Yup. Until today it was also really hard to take advantage of different engines found in PostgreSQL. ;-) This is a totally different product.

In general, all of the stuff you are describing are features, not things that cause maintenance complexity. In fact, manipulating those things causes maintenance complexity.

> Databases just aren't that hard to set up. Install packages, install config files, start the DB, restore from a backup file, restart the DB, and you're golden.

I had no idea PCI compliance could be that simple. ;-)

> This is particularly simple when you use an orchestration tool; I recommend Ansible personally.

Yes, orchestration tools, if set up properly are exactly how you'd want to do this kind of thing. If you already have all that setup to manage your database, RDS is likely not going to help.

oceanplexian · on Nov 15, 2013

FYI RDS is no longer HIPAA compliant following the latest Omnibus legislation.

You need to be running dedicated instances inside of VPC with your own DB install.