At the end of the day it's an economics question. How much might you lose in a worst case scenario if you failed to recover some vital data? How much extra will it cost to back up your data to a second place? If the answers to those questions are "everything" and "not a lot" then there is no reason not to backup your backups.
As a general rule important data should be backed up to (at least) two separate places, and in this scenario I'd consider S3 to be one place.
You can actually kind of do that. There's a system intended to allow random users credential-free upload access to your buckets via POST, but you can enforce policy control like "the target has to have xyz/ as a prefix" and "the size must be fewer than 100000 bytes". Here are the S3 docs for it: http://docs.amazonwebservices.com/AmazonS3/2006-03-01/index....
I have never seen a documented case of S3 losing data. Please provide references.
I have recommended to some customers that they back up S3 data to the S3 service running in another region. The new export service also provides a way to get physical copies of your data but, depending on how much data you have, it might not be practical.
Come on...one of these is from 06, and it looks like it was totally dealt w/ by Amazon customer service, and the other one was from 07 and is a complaint about EC2 when it was still in beta.
Would love to see REAL documented cases of data loss on Amazon, and anything in the last year would be great too.
Not saying it's infallible, and you should always back up your data (Amazon can't prevent things like natural disasters - this is the point of backing up in general - you can't predict), but if you're going to act like you gave good examples, at least give good examples.
Look, all I'm saying is it happened before, Amazon makes absolutely no guarantees and it could very well happen again.
To categorically deny this because you think the cases are not 'good enough' is arguing that only when there is specific documentation about S3 losing data in the last couple of months or a year will convince you that Amazon S3 can indeed lose data. Even if Amazon S3 had never lost data before then there still would be no reason to assume that it could not happen.
S3 is made up from hardware and built by people. It can - and most likely will - fail again, it has already done so in the past. When the last case was is not really relevant, just like when the last earthquake was is not really relevant when you're living on a fault line.
Earthquakes - and data loss - are a fact of life in the IT business, you plan for them, or you weigh the economics of the risk and you decide that you can re-create your data at a lower cost than it will cost you to back it up over the average time to failure.
Amazon will not be able to magically recreate your data so if you have a business incentive to keep your data (such as a responsibility to third parties) then you should back it up.
It's that simple.
Oh, and regarding amazon customer service, note that it took them 11 days to pinpoint the fault, and customer data actually was lost.
Check Allans post at Jun 23, 2008 6:28 AM for a pretty good insight into how easy it is for S3 to break.
What also bothers me is that apparently all traffic for these customers was passing the same SPOF, a single load balancer.
Another thing to take home from this is to ALWAYS supply an MD5 of your data and keep an MD5 of what you sent.
Gmail, another example of a large body of data that end users have some attachement to has also occasionally lost data, see:
Sure, you could argue, gmail is not S3, but that is not relevant, the things they have in common (type of architecture, kind of hardware, run by very fallible people) are what matter.
Not saying it's infallible, and you should always back up your data (Amazon can't prevent things like natural disasters - this is the point of backing up in general - you can't predict), but if you're going to act like you gave good examples, at least give good examples.
If that's what you call categorically denying that it can happen....
Again, please find a case in the last 2 years even.
I think we both agree you should back up your data, and as an IT policy it's obviously incorrect to ever think you're 100% safe, and if you use S3 you should still be redundant if you want to get closer to that 99.9% limit. But you'll never be 100% - that's life.
The only reason I defend S3 so heavily is that compared to the other options you'd be using instead of (or better: in concurrence with) S3, it's probably among the safest, data loss wise.
Sofar I have never heard about S3 actually losing data (apart from users erasing it). They store everything 3 times. But I would have a look at the cost/MB. Documents are usually small and important (thus, worth to backup).
I have read that SmugMug, for example, stores it's files just on S3, with no backup.
Very very important caveat, there. If you have three copies of the data, but all of them are in the same S3 account, a prankster who steals your S3 creds can delete all of them in about ten seconds.
Or, if you make a typing mistake, you can do that to yourself. Boy oh boy, will that be an unhappy day.
Very true. One instance I can think of is using the firefox addon S3Fox for managing S3. My co-worker came close to deleting a prod bucket with 80K+ images.
You should certainly backup your data. While the engineers at Amazon are certainly among the smartest, they are not infallible. Things do fail from time to time.
I lost 1TB of data several months ago due to some backend issues with EBS and S3. Fortunately for me, it was just a backup of a backup of a backup. ;-)
It's also worth pointing out that this might have been more of a failure of EBS (elastic block storage, an abstraction over S3) than with native S3 buckets.
Ultimately, my EBS device become unusable by any operating system, and Amazon support stated that the data was lost due to several backend systems failing.
At $0.15/GB it's a no-brainer for the piece of mind. I once heard of a case (not with Amazon) where data was replicated in 3 locations, and data got corrupted in one location and was copied to the others, resulting in corrupted data in all 3.
fyi, you should take a look at cloudloop (www.cloudloop.com). it has a nice command line interface that lets you sync data across providers (s3, rackspace, nirvanix, azure, etc).
As a general rule important data should be backed up to (at least) two separate places, and in this scenario I'd consider S3 to be one place.