If you don't mind my asking, what is the percentage savings achieved by de-duplication across all of Dropbox? Some others here have wondered if it was premature optimization.
It wasn't a premature optimization. It was both a better experience for the user (saves bw/reuploads for the user) and was simpler to implement (can keep things in one global bucket) given we didn't want things like renames to trigger reuploads and had to use checksums as a result.
But you could prevent reuploads with per-user de-duplication, while avoiding the privacy issue of cross-user de-duplication.
I could see why this would be more work to implement (you have to key on user+contenthash), but it would still be interesting to know how much Dropbox and its users actually benefit from cross-user de-duplication.
I have a hard time understanding this line of argument.
Per-user deduplication will mean, I need not upload the same file twice into my own account? What's the use of this?
I keep some of my 'paid for' software installables backed up in my Dropbox, and they tot up to ~1.5 GB (the Humble Indie Bundle games). When I started the upload however, it took maybe 5 seconds because of cross-user deduplication, and I am super grateful to them for this feature.
I imagine this feature saves users tons of bandwidth, as most of the people I know use Dropbox for backing up important software, rare music and videos.
> I have a hard time understanding this line of argument.
It's not a line of argument. It's a line of inquiry. You've given anecdotal evidence that cross-user deduplication benefits you and people you know, but what about some actual numbers from Dropbox?
Producing actual numbers -- "eg cross-user deduplication saves our users 30% of their upload time and bandwidth, on average" -- seems like a great way for Dropbox to counter this issue.
> I imagine this feature saves users tons of bandwidth, as most of the people I know use Dropbox for backing up important software, rare music and videos.
We don't have to imagine! Let there be numbers!
Also -- "rare" music and videos that everyone's uploading duplicates of? ;)