How would you feel if one day that data was lost ? There are already documented ...

garnaat · on Sept 14, 2009

I have never seen a documented case of S3 losing data. Please provide references.

I have recommended to some customers that they back up S3 data to the S3 service running in another region. The new export service also provides a way to get physical copies of your data but, depending on how much data you have, it might not be practical.

jacquesm · on Sept 14, 2009

http://developer.amazonwebservices.com/connect/thread.jspa?t...

Do you need more or do you think that is enough like that ?

We've been here before by the way:

http://news.ycombinator.com/item?id=528541

wdewind · on Sept 14, 2009

Come on...one of these is from 06, and it looks like it was totally dealt w/ by Amazon customer service, and the other one was from 07 and is a complaint about EC2 when it was still in beta.

Would love to see REAL documented cases of data loss on Amazon, and anything in the last year would be great too.

Not saying it's infallible, and you should always back up your data (Amazon can't prevent things like natural disasters - this is the point of backing up in general - you can't predict), but if you're going to act like you gave good examples, at least give good examples.

jacquesm · on Sept 14, 2009

Look, all I'm saying is it happened before, Amazon makes absolutely no guarantees and it could very well happen again.

To categorically deny this because you think the cases are not 'good enough' is arguing that only when there is specific documentation about S3 losing data in the last couple of months or a year will convince you that Amazon S3 can indeed lose data. Even if Amazon S3 had never lost data before then there still would be no reason to assume that it could not happen.

S3 is made up from hardware and built by people. It can - and most likely will - fail again, it has already done so in the past. When the last case was is not really relevant, just like when the last earthquake was is not really relevant when you're living on a fault line.

Earthquakes - and data loss - are a fact of life in the IT business, you plan for them, or you weigh the economics of the risk and you decide that you can re-create your data at a lower cost than it will cost you to back it up over the average time to failure.

Amazon will not be able to magically recreate your data so if you have a business incentive to keep your data (such as a responsibility to third parties) then you should back it up.

It's that simple.

Oh, and regarding amazon customer service, note that it took them 11 days to pinpoint the fault, and customer data actually was lost.

Check Allans post at Jun 23, 2008 6:28 AM for a pretty good insight into how easy it is for S3 to break.

What also bothers me is that apparently all traffic for these customers was passing the same SPOF, a single load balancer.

Another thing to take home from this is to ALWAYS supply an MD5 of your data and keep an MD5 of what you sent.

Gmail, another example of a large body of data that end users have some attachement to has also occasionally lost data, see:

http://www.thebitguru.com/blog/view/252-Have you lost email on gmail

Sure, you could argue, gmail is not S3, but that is not relevant, the things they have in common (type of architecture, kind of hardware, run by very fallible people) are what matter.

wdewind · on Sept 14, 2009

As I said:

Not saying it's infallible, and you should always back up your data (Amazon can't prevent things like natural disasters - this is the point of backing up in general - you can't predict), but if you're going to act like you gave good examples, at least give good examples.

If that's what you call categorically denying that it can happen....

Again, please find a case in the last 2 years even.

I think we both agree you should back up your data, and as an IT policy it's obviously incorrect to ever think you're 100% safe, and if you use S3 you should still be redundant if you want to get closer to that 99.9% limit. But you'll never be 100% - that's life.

The only reason I defend S3 so heavily is that compared to the other options you'd be using instead of (or better: in concurrence with) S3, it's probably among the safest, data loss wise.