> Each file is broken into chunks and encrypted by iCloud using AES-128 and a key derived from each chunk’s contents that utilizes SHA-256. The keys and the file’s metadata are stored by Apple in the user’s iCloud account. The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
From what I’ve heard, Apple’s services run on their own cloud platform called Pie [0]. It sounds like this platform probably abstracts away whatever storage service is used, allowing Apple to use whatever fits their requirements.
I think this is just coincidence though. Apple most likely performs their own layer of encryption on top of what Google already does.
They protect against two different attack scenarios: Google doesn't want plaintext data to show up on disks. Apple doesn't want plaintext data to be accessible by anyone at Google (or any other 3rd party provider).
The encryption technique sounds like convergent encryption.
Convergent encryption is a fairly standard way storing encrypted data that might duplicate across users. It allows for deduping of data while in an encrypted state.
Doesn’t this in some way betray the encryption? For example, if multiple users had illegal.csv in their accounts, getting access to one user will allow you to prosecute any of them without decrypting their data because they will all share the same sha?
It is (as the Wikipedia article points out). It's basically equivalent to storing passwords hashed without a salt, it makes makes it obvious when a password is reused, or in this case the same file stored twice.
That means that the provider always knows if a file is shared across multiple users (even if they don't know what the file is) and given a cleartext file they can always check if somebody has it stored on the service. It's not ideal at all if you want good privacy.
SpiderOak is explicitly supposed not to do that[1] to avoid these issues, however I refuse to recommend them until they finally decide to release an open source client and 2FA support so caveat emptor.
Anyway, in this case it's irrelevant because Apple has access to the keys anyway, it's only supposed to prevent the third party (Google Cloud in this case) from having access to the files.
It should be pointed out that this "security flaw" is the only way these cloud file storage platforms are cost effective. Since it allows the same anonymous blob stored by several dozen users to only take up one unit of space instead of one unit per user.
This is particularly space saving if a particular file type was chunked cleverly so that static parts of the file's structure were stored away from dynamic parts (Microsoft's Office XML formats for example could definitely be split this way).
I don't believe that. Apple charges $10/mo for 1TB of data. That's $120/yr. You can get 1TB drives commercially for under $50 these days, you'd need a bit more than 1 drive per user for reliability & sharding, but still. Not de-duplicating the data would clearly be profitable anyway.
Hetzner charges less than half of that for their storage solution, and that's without de-duplication[1].
Does decreasing the security of their users & not offering end-to-end encryption save them even more money? Sure, but I don't see how it couldn't be profitable without it, seeing as you can easily buy non-de-duplicated cloud storage for way less from other providers.
Just look at the pricing for Amazon S3. It cost 20-30$/month * TB. This makes Apples 10$ a month for 1TB looks like a fairly good deal.
Don't forget that the price includes VAT and other taxes. General infrastructure costs, including traffic. Redundancy. Backups. I don't think they're turning much of a profit, but offer it because it makes their platform more attractive.
I'm sure Apple makes plenty of money from users that don't fill up their space to pay for the ones that do.
I have a 2TB family plan that I'm paying $10/mo for, a total of 135GB is used. Most of that usage is from iCloud Photo Library, and while I could use the 200GB plan I'd rather just pay the extra to have effectively unlimited space and not worry about needing to upgrade the plan, especially as I record video in 4K by default now.
I'm sure I'm not the only person who does this, $10/mo is probably nothing to the vast majority of people on HN compared to the hassle of worrying about whether we have enough storage or not.
iOS is pretty good about nudging you to the next tier as you approach filling your current one, IIRC. When I recently had to jump from 200GB plan to 2TB it was one tap on a notification to move to next tier for my whole family, so even for those of us who can easily afford it's arguably not that sensible a strategy to jump straight to 2TB, unless one's goal is to subsidize Apple as much as you can.
HEVC/H.265 introduced in iOS 11 largely makes shooting 4K on an iOS device cost _roughly_ the same in storage terms as 1080p/H.264 did, so recording in 4K doesn't really move the storage needle here like it used to either, assuming of course you have an iPhone 7 or newer.
I read this comment and was confused because it is exactly what I would have written and thought I had a moment of amnesia where I forgot that I posted a response...
Same boat 100%... Family sharing, 2TB plan, have my wife and mother in the family. Combined we use less than 200GB, but I am happy to pay for more storage rather than deal with tech support calls from them two when we go over 200GB and they get error messages popping up. :P
Apple includes their own office suite for free with all macs so if you don’t need 100% office compatibility why give MS more money? I haven’t had or used office in some years now and don’t miss it a bit.
Absolutely. I was just pointing out that the price is in the better end of the scale, especially to end consumers only buying a relatively small amount of storage.
It is from the standpoint of the average user that doesn't want to hassle with figuring out what backup provider to use, figure out what software they need to install, or wants to share the data automatically between multiple devices natively.
As the most popular comment here said when Dropbox was announced, "1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software."
Personally, I'm using seafile on a server, it's cheaper than iCloud or Dropbox, with more storage, and more custom functionality. Many NAS nowadays support Nextcloud out of the box.
I worked out the math a while ago and it will take over 10 years for a home-built NAS to be more cost-effective than iCloud, and that's only if there are no hardware or disk failures, and doesn't include the additional hassle of being out of Apple's model when it comes to auto-backups and storage.
It depends on what you value. If you value keeping your data at home, and accept the trade off of higher cost and the need to maintain it yourself and use some sort of sync mechanism that isn't built into the OS, then you do that.
Yes, you are obviously not their target market. Most people don't know what a NAS is, or what FTP is. With iCloud they toggle a switch and it Happens Like Magic.
The average person can figure out a car, can figure out how to vote in complicated voting systems, and can figure out much more.
You don’t have to dumb down things unnecessarily. If a person can buy a smart home device, plug it in and configure it, or an Apple TV, or a computer, then they can also get one of the simpler NAS, and sync with that.
It was a naive and ridiculous comment then, and it's neither more informed nor less ridiculous now.
(Which is kind of a shame because a truly bulletproof, user-friendly NAS might be an interesting product. Who is going to be the Apple to Synology's Microsoft?)
I find this hard to believe. I get that it reduces space used _in thoery_, but do you have any data on the % of uploads that are duplicated across clients?
It would have to be very, very large to justify the claim that it turns a previously non-cost-effective service into a cost-effective one, especially as prices have been dropping for some time and would hypothetically have simply not dropped as fast if this was the case.
* Offer to charge less usage if a duplicate block is detected, but allow users to pay full price for privately-salted storage. This is similar in principle to how Data Saver works on your mobile device (opt-in to allow a man-in-the-middle to compress or downsample your data)
I thought they same reading this thread, but there's two huge problems with that:
a) Once you know the chunk size, you can determine based on pricing whether another customer has that data, which can have huge privacy implications.
E.g. let's say we both work at the same company and get the same salary statement PDF aside from the dollar number & your name (which I know).
I can simply brute-force craft a file that changes that number around and upload it to iCloud, when I stop paying for storage I know I've cracked what's on your drive.
In any case, I'd be surprised if Apple's not already leaking this information due to caching in a way that could be revealed via timing attacks.
b) It'll lead to hugely erratic pricing for consumers. E.g. let's say you download 100TB of movies from BitTorrent, now you pay almost nothing for it, but if everyone else deletes their copies pricing for you will go up.
Apple could mitigate that by never raising the price on a given chunk, but that just leaves them paying for it, and it's easily abused. Open two accounts, upload the same data, then delete it from one account, pay 1/2 for storage.
That’s a great solution, but at what point is it easier to just store your data yourself? If someone worried about their provider knowing some user has the same file as another user, that person shouldn’t really be trusting Google or Apple with anything.
”in this case it's irrelevant because Apple has access to the keys anyway”
Not completely, I would think. Google may be able to discover whether any user stores file F in iCloud by creating an iCloud account for themselves and uploading the file to it. If that doesn’t create a new file, it already was there before. Depending on what exactly Apple stores on iCloud, they may even be able to detect how many users or (unlikely) even which users store the file.
I don’t see how they could use it, and don’t think they would use it, but Google also has a large data set of email messages and attachments sent between iCloud and gmail accounts that they could somehow use to correlate activity between their gmail servers and the “iCloud on Google cloud” servers.
The encryption is to protect data from snooping by the cloud provider Apple hosts the chunks on, not provide end-to-end privacy/anonymity to users from courts or Apple itself. Apple has and WILL comply with subpoena's to gain access to the contents of an iCloud account (sans Keychain where Apple has literally no way of decrypting the data themselves), as such you should never assume data in iCloud is safe from a DA or prosecutor.
It leaks this information if you use the same key for all users. It's not clear that Apple is doing this. There is some benefit to convergent encryption with a different key per user (e.g. you dedup across multiple backups from the same user).
This is why I like to GPG encrypt my stuff before I upload it to Dropbox. This way, it's pretty much guaranteed not to register as something easy to correlate against, and I can use Dropbox as just a free file store.
Data is broken in to chunks, then encrypted, then stored. Given that the AWS keys vary by user, and it’s not done file-by-file, any de-duping would be entirely coincidence and not leak anything.
I used Jclouds [1] a few years back to do this. While it's very powerful, it's also fairly complex (abstracts not just storage but also compute/etc). At the time, I remember wanting a simpler abstraction layer over it, and ended up building my own. Maybe something like that exists now, I'm not sure.
The devil is in the details of how each provider offers vendor-specific things and you want to take advantage of them. For example Reduced-Redundancy storage
is iirc an Amazon-specific offering, or if others offer it, it's probably under different SLA terms/measurements. This rapidly breaks many generic abstractions; maybe this is why everyone ends up writing their own little shim layer for their situation.
In some sense it reminds me a bit of building database-connection-pools in the 90's, before they were really standardized everyone rolled their own and learned all the awful lessons about reference counting along the way. Then along came ODBC, then JDBC, and things were so much easier because you only had to deal with one API, and the databases would conform to their side of it. So I think, isn't that what OpenStack (or something?) is supposed to be for cloud services? But whoa, the depth and complexity of these services far exceeds that of a 90's database. It will take a while -- but over time and with patterns of common use well established, a stable base of standard APIs will abstract away most differences, making things so much nicer. I can dream.
The S3 API signature seems to be the defacto abstraction. Network appliance supports it for their (on premisis) hardware. Minio supports it in their open source object storage software. The API signature seems to support competition via tags[1] and other features. Not feeling a lot of lock-in specific to S3.
Often the reason to open source is to get developer mindshare (tensorflow) or in the hope of gaining traction to compete against existing closed source tech (opencompute).
The problem there being that small companies get fooled into thinking they need it and spend a bunch of time abstracting away (and complicating!) a service layer they'll likely never change.
Every fortune 50 company has a similar system at this point; you tie it in with your enterprise architecture platform so you can go directly from design to infrastructure using tested patterns.
Pretty much table stakes in large enterprise at this point; and as usual IBM / SAP / Oracle / SoftwareAG cabal have bought up everything that matters.
> Each file is broken into chunks and encrypted by iCloud using AES-128 and a key derived from each chunk’s contents that utilizes SHA-256. The keys and the file’s metadata are stored by Apple in the user’s iCloud account. The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
From what I’ve heard, Apple’s services run on their own cloud platform called Pie [0]. It sounds like this platform probably abstracts away whatever storage service is used, allowing Apple to use whatever fits their requirements.
[0] - https://9to5mac.com/2016/10/06/report-unified-cloud-services...