Hacker Newsnew | past | comments | ask | show | jobs | submit | garnaat's commentslogin

FWIW, Xerox shipped an all-Python product in 1997 called DocuShare. It is still being sold but was re-written in Java a few years later because they felt their customers wanted it to be more enterprisey. Today the use of Python would not be controversial at all so we have come a long way.


With boto, I battled for years trying to avoid any dependencies. But that has a lot of negative side effects, too. One of the great things about Python is the amazing variety and quality of libraries available. We decided to embrace that with AWS CLI. We have 10 direct dependencies. Four of those are our own packages that we decided to split to allow maximum reuse. Then there are fundamental things like requests, six, docutils. The rest are things that, we think, improve the experience. Virtualenv is an awesome way to manage this. I highly recommend it.


The rsync command uses a combination of file modtimes and file sizes as it's default algorithm. It's very fast and efficient. I agree, though, that like rsync, it would be good to add a --checksum option to the s3 sync command in AWS CLI. Feel free to create an issue on our github site https://github.com/aws/aws-cli so we can track that.


I still think the Amazon S3 API is really nice.


"Wall of Shame"? Really?

As someone who's project appears on this list (and in the WRONG color) all I can say is that I don't think anyone is trying to dis Python 3.x. Support will come when a critical mass of developers are using 3.x. I know it's kind of a chicken and egg problem but this seems to be saying that it's the package developer's fault and I don't really think that's fair or true.


I think it's not really a chicken and egg problem though.

From a user point of view, using Python 3 looks like giving up on a lot of libraries, which in turn means a lot more work to accomplish the same things. It's a big effort that does not benefit anyone. If switching to a new version of a language makes my life harder, then I will not, especially if working on a startup or a project which is time critical.

On the other hand, this is a work that the developers of the libraries will have to do anyway, sooner or later. Doing it now would benefit a lot of people that would like to use the latest version of Python and now simply cannot afford. So why do not do it? In this case the effort would benefit a lot of people at once, which is the purpose of a library in the first place.


It's a chicken/egg problem in that developers are saying I'm not porting because no one is using Python 3.x and users are saying I'm not moving to Python 3.x because none of my packages are available.

I can't just move to Python 3.x and abandon 2.x. And I have not been able to find a way to have boto support both with the same code base. So, then it becomes a matter of maintaining multiple versions of boto. Just shoot me now. If the barriers weren't so high, more packages would be running in Python3.x.


Yep, everyone has limited time and resources. Another problem is that many libraries have dependencies, and they can't start porting until the dependencies have been ported (or a replacement is found). Finally, library developers only want to develop for languages they like using, and who likes a language with no good libraries?


"On the other hand, this is a work that the developers of the libraries will have to do anyway, sooner or later. "

But... will they "have to"? If a library author is waiting for 'critical adoption', they may never upgrade to Python 3 themselves, and have no need to upgrade the library, and the circle continues.


It might mean they'll have to support two versions for a longer while though. Which also means making sure all community patches are provided for both branches, or manually integrating them.


Which package is in the wrong color?


If you are interested in REST API's, the best I've ever read is "RESTful Web Services" by Richardson & Ruby.


I suppose it's possible but S3 is designed for nine 9's wrt to durability. There are > 100 billion objects and I'm not aware of any being lost due to AWS fault.


Actually, I'm wrong. It's eleven 9's. http://bit.ly/ageV9D


There was a pretty lively exchange on twitter last night regarding this. I strongly disagree with the AWS forum poster. EBS does not suck. In fact, EBS and other services from AWS and Rackspace provide the building blocks to allow you to construct incredibly scalable, available systems.

However, you have to accept that when you use IaaS you are taking on some of the operational responsibility and you have to know what you are doing or find someone who does. If this user had been snapshotting regularly to S3, the worst thing they would have experienced is a couple of hours of downtime. All of their data would have been safe and easily recovered.

They didn't do that and the worst case scenario that AWS clearly describes in it's docs (failure of MULTIPLE devices) happened. And it will happen again, someday. Accept that and accept that failure is a feature when systems are designed properly.


Variety is good but pushups are a fantastic core exercise. They really workout much more than your arms and you can do them anywhere. Plus, these workouts do build up a certain amount of aerobic value because they involve multiple sets with shorts rests in between.

I'm up to 75! But best to also do some squats, some situps/crunches, etc.


I have never seen a documented case of S3 losing data. Please provide references.

I have recommended to some customers that they back up S3 data to the S3 service running in another region. The new export service also provides a way to get physical copies of your data but, depending on how much data you have, it might not be practical.



Come on...one of these is from 06, and it looks like it was totally dealt w/ by Amazon customer service, and the other one was from 07 and is a complaint about EC2 when it was still in beta.

Would love to see REAL documented cases of data loss on Amazon, and anything in the last year would be great too.

Not saying it's infallible, and you should always back up your data (Amazon can't prevent things like natural disasters - this is the point of backing up in general - you can't predict), but if you're going to act like you gave good examples, at least give good examples.


Look, all I'm saying is it happened before, Amazon makes absolutely no guarantees and it could very well happen again.

To categorically deny this because you think the cases are not 'good enough' is arguing that only when there is specific documentation about S3 losing data in the last couple of months or a year will convince you that Amazon S3 can indeed lose data. Even if Amazon S3 had never lost data before then there still would be no reason to assume that it could not happen.

S3 is made up from hardware and built by people. It can - and most likely will - fail again, it has already done so in the past. When the last case was is not really relevant, just like when the last earthquake was is not really relevant when you're living on a fault line.

Earthquakes - and data loss - are a fact of life in the IT business, you plan for them, or you weigh the economics of the risk and you decide that you can re-create your data at a lower cost than it will cost you to back it up over the average time to failure.

Amazon will not be able to magically recreate your data so if you have a business incentive to keep your data (such as a responsibility to third parties) then you should back it up.

It's that simple.

Oh, and regarding amazon customer service, note that it took them 11 days to pinpoint the fault, and customer data actually was lost.

Check Allans post at Jun 23, 2008 6:28 AM for a pretty good insight into how easy it is for S3 to break.

What also bothers me is that apparently all traffic for these customers was passing the same SPOF, a single load balancer.

Another thing to take home from this is to ALWAYS supply an MD5 of your data and keep an MD5 of what you sent.

Gmail, another example of a large body of data that end users have some attachement to has also occasionally lost data, see:

http://www.thebitguru.com/blog/view/252-Have you lost email on gmail

Sure, you could argue, gmail is not S3, but that is not relevant, the things they have in common (type of architecture, kind of hardware, run by very fallible people) are what matter.


As I said:

Not saying it's infallible, and you should always back up your data (Amazon can't prevent things like natural disasters - this is the point of backing up in general - you can't predict), but if you're going to act like you gave good examples, at least give good examples.

If that's what you call categorically denying that it can happen....

Again, please find a case in the last 2 years even.

I think we both agree you should back up your data, and as an IT policy it's obviously incorrect to ever think you're 100% safe, and if you use S3 you should still be redundant if you want to get closer to that 99.9% limit. But you'll never be 100% - that's life.

The only reason I defend S3 so heavily is that compared to the other options you'd be using instead of (or better: in concurrence with) S3, it's probably among the safest, data loss wise.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: