From Apple’s actual iCloud security document: > Each file is broken into chunks ...

kudokatz · on Feb 26, 2018

> Each file is broken into chunks and encrypted by iCloud using AES-128 and a key derived from each chunk’s contents

this is exactly the strategy described at the Google Cloud Summit 2017 relating to how security is managed for their platform ... so it lines up

delroth · on Feb 26, 2018

Disclaimer: I work at Google, not on cloud.

I think this is just coincidence though. Apple most likely performs their own layer of encryption on top of what Google already does.

They protect against two different attack scenarios: Google doesn't want plaintext data to show up on disks. Apple doesn't want plaintext data to be accessible by anyone at Google (or any other 3rd party provider).

greensoap · on Feb 26, 2018

The encryption technique sounds like convergent encryption.

Convergent encryption is a fairly standard way storing encrypted data that might duplicate across users. It allows for deduping of data while in an encrypted state.

https://en.wikipedia.org/wiki/Convergent_encryption

MichaelRenor · on Feb 26, 2018

Doesn’t this in some way betray the encryption? For example, if multiple users had illegal.csv in their accounts, getting access to one user will allow you to prosecute any of them without decrypting their data because they will all share the same sha?

simias · on Feb 26, 2018

It is (as the Wikipedia article points out). It's basically equivalent to storing passwords hashed without a salt, it makes makes it obvious when a password is reused, or in this case the same file stored twice.

That means that the provider always knows if a file is shared across multiple users (even if they don't know what the file is) and given a cleartext file they can always check if somebody has it stored on the service. It's not ideal at all if you want good privacy.

SpiderOak is explicitly supposed not to do that[1] to avoid these issues, however I refuse to recommend them until they finally decide to release an open source client and 2FA support so caveat emptor.

Anyway, in this case it's irrelevant because Apple has access to the keys anyway, it's only supposed to prevent the third party (Google Cloud in this case) from having access to the files.

[1] https://spideroak.com/resources/encryption-white-paper in the "Data Deduplication" section.

Someone1234 · on Feb 26, 2018

It should be pointed out that this "security flaw" is the only way these cloud file storage platforms are cost effective. Since it allows the same anonymous blob stored by several dozen users to only take up one unit of space instead of one unit per user.

This is particularly space saving if a particular file type was chunked cleverly so that static parts of the file's structure were stored away from dynamic parts (Microsoft's Office XML formats for example could definitely be split this way).

avar · on Feb 26, 2018

I don't believe that. Apple charges $10/mo for 1TB of data. That's $120/yr. You can get 1TB drives commercially for under $50 these days, you'd need a bit more than 1 drive per user for reliability & sharding, but still. Not de-duplicating the data would clearly be profitable anyway.

Hetzner charges less than half of that for their storage solution, and that's without de-duplication[1].

Does decreasing the security of their users & not offering end-to-end encryption save them even more money? Sure, but I don't see how it couldn't be profitable without it, seeing as you can easily buy non-de-duplicated cloud storage for way less from other providers.

1. https://www.hetzner.com/storage-box?country=us

hvidgaard · on Feb 26, 2018

Just look at the pricing for Amazon S3. It cost 20-30$/month * TB. This makes Apples 10$ a month for 1TB looks like a fairly good deal.

Don't forget that the price includes VAT and other taxes. General infrastructure costs, including traffic. Redundancy. Backups. I don't think they're turning much of a profit, but offer it because it makes their platform more attractive.

snuxoll · on Feb 26, 2018

I'm sure Apple makes plenty of money from users that don't fill up their space to pay for the ones that do.

I have a 2TB family plan that I'm paying $10/mo for, a total of 135GB is used. Most of that usage is from iCloud Photo Library, and while I could use the 200GB plan I'd rather just pay the extra to have effectively unlimited space and not worry about needing to upgrade the plan, especially as I record video in 4K by default now.

I'm sure I'm not the only person who does this, $10/mo is probably nothing to the vast majority of people on HN compared to the hassle of worrying about whether we have enough storage or not.

giobox · on Feb 26, 2018

iOS is pretty good about nudging you to the next tier as you approach filling your current one, IIRC. When I recently had to jump from 200GB plan to 2TB it was one tap on a notification to move to next tier for my whole family, so even for those of us who can easily afford it's arguably not that sensible a strategy to jump straight to 2TB, unless one's goal is to subsidize Apple as much as you can.

HEVC/H.265 introduced in iOS 11 largely makes shooting 4K on an iOS device cost _roughly_ the same in storage terms as 1080p/H.264 did, so recording in 4K doesn't really move the storage needle here like it used to either, assuming of course you have an iPhone 7 or newer.

cstrat · on Feb 26, 2018

I read this comment and was confused because it is exactly what I would have written and thought I had a moment of amnesia where I forgot that I posted a response...

Same boat 100%... Family sharing, 2TB plan, have my wife and mother in the family. Combined we use less than 200GB, but I am happy to pay for more storage rather than deal with tech support calls from them two when we go over 200GB and they get error messages popping up. :P

lostmsu · on Feb 26, 2018

Wow, MS Office 365 costs like $100/y for 5 users, each getting 1TB of OneDrive. And also, well, Office.

tekknik · on Feb 27, 2018

Apple includes their own office suite for free with all macs so if you don’t need 100% office compatibility why give MS more money? I haven’t had or used office in some years now and don’t miss it a bit.

snuxoll · on Feb 26, 2018

OneDrive also doesn't integrate well with my Apple devices, so there's that.

I do also pay for Office 365 though, $100/yr is significantly cheaper than purchasing Office for every computer in my house.

wingworks · on Feb 26, 2018

Apple probably pays less than $20-30/m per TB though, what with being kinda a large customer.

hvidgaard · on Feb 27, 2018

Absolutely. I was just pointing out that the price is in the better end of the scale, especially to end consumers only buying a relatively small amount of storage.

kuschku · on Feb 26, 2018

Hetzner, OVH, Online.net offer between 8$ and 10$/month per terabyte, duplicated and backed up, with availability guarantee.

Apple’s $10 is absolutely not a good deal.

vesrah · on Feb 26, 2018

It is from the standpoint of the average user that doesn't want to hassle with figuring out what backup provider to use, figure out what software they need to install, or wants to share the data automatically between multiple devices natively.

kuschku · on Feb 26, 2018

As the most popular comment here said when Dropbox was announced, "1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software."

Personally, I'm using seafile on a server, it's cheaper than iCloud or Dropbox, with more storage, and more custom functionality. Many NAS nowadays support Nextcloud out of the box.

rmrfrmrf · on Feb 27, 2018

I worked out the math a while ago and it will take over 10 years for a home-built NAS to be more cost-effective than iCloud, and that's only if there are no hardware or disk failures, and doesn't include the additional hassle of being out of Apple's model when it comes to auto-backups and storage.

kelnos · on Feb 27, 2018

It depends on what you value. If you value keeping your data at home, and accept the trade off of higher cost and the need to maintain it yourself and use some sort of sync mechanism that isn't built into the OS, then you do that.

vesrah · on Feb 26, 2018

Yes, you are obviously not their target market. Most people don't know what a NAS is, or what FTP is. With iCloud they toggle a switch and it Happens Like Magic.

kuschku · on Feb 26, 2018

The average person can figure out a car, can figure out how to vote in complicated voting systems, and can figure out much more.

You don’t have to dumb down things unnecessarily. If a person can buy a smart home device, plug it in and configure it, or an Apple TV, or a computer, then they can also get one of the simpler NAS, and sync with that.

arbie · on Feb 27, 2018

Except your backup is now on-site, has no redundancy and has poor accessibility outside the home network.

Geographical redundancy and low-latency data transport are nontrivial factors in pricing cloud storage.

CamperBob2 · on Feb 27, 2018

It was a naive and ridiculous comment then, and it's neither more informed nor less ridiculous now.

(Which is kind of a shame because a truly bulletproof, user-friendly NAS might be an interesting product. Who is going to be the Apple to Synology's Microsoft?)

hvidgaard · on Feb 27, 2018

So if you go somewhere else, you can get a worse experience that is not build into the OS, and save 2$/TB.

kelnos · on Feb 27, 2018

It's only a worse experience if you don't value the lack of de-duplication and the increase in privacy that gives you. Which is fine either way.

tekknik · on Feb 27, 2018

So same price and doesn’t integrate with Apple devices?

SystemOut · on Feb 26, 2018

Keep in mind that most people don't actually store the full 1TB of data.

philipodonnell · on Feb 26, 2018

I find this hard to believe. I get that it reduces space used _in thoery_, but do you have any data on the % of uploads that are duplicated across clients?

It would have to be very, very large to justify the claim that it turns a previously non-cost-effective service into a cost-effective one, especially as prices have been dropping for some time and would hypothetically have simply not dropped as fast if this was the case.

jmulho · on Feb 26, 2018

Pretty much everybody keeps a copy of "Bringing Sexy Back" on iCloud.

kolpa · on Feb 26, 2018

It's not the "only" way. Another way:

* Offer to charge less usage if a duplicate block is detected, but allow users to pay full price for privately-salted storage. This is similar in principle to how Data Saver works on your mobile device (opt-in to allow a man-in-the-middle to compress or downsample your data)

avar · on Feb 26, 2018

I thought they same reading this thread, but there's two huge problems with that:

a) Once you know the chunk size, you can determine based on pricing whether another customer has that data, which can have huge privacy implications.

E.g. let's say we both work at the same company and get the same salary statement PDF aside from the dollar number & your name (which I know).

I can simply brute-force craft a file that changes that number around and upload it to iCloud, when I stop paying for storage I know I've cracked what's on your drive.

In any case, I'd be surprised if Apple's not already leaking this information due to caching in a way that could be revealed via timing attacks.

b) It'll lead to hugely erratic pricing for consumers. E.g. let's say you download 100TB of movies from BitTorrent, now you pay almost nothing for it, but if everyone else deletes their copies pricing for you will go up.

Apple could mitigate that by never raising the price on a given chunk, but that just leaves them paying for it, and it's easily abused. Open two accounts, upload the same data, then delete it from one account, pay 1/2 for storage.

jbuild · on Feb 26, 2018

That’s a great solution, but at what point is it easier to just store your data yourself? If someone worried about their provider knowing some user has the same file as another user, that person shouldn’t really be trusting Google or Apple with anything.

rmrfrmrf · on Feb 27, 2018

Sounds like a great side-channel attack vector for law enforcement.

Someone · on Feb 26, 2018

”in this case it's irrelevant because Apple has access to the keys anyway”

Not completely, I would think. Google may be able to discover whether any user stores file F in iCloud by creating an iCloud account for themselves and uploading the file to it. If that doesn’t create a new file, it already was there before. Depending on what exactly Apple stores on iCloud, they may even be able to detect how many users or (unlikely) even which users store the file.

I don’t see how they could use it, and don’t think they would use it, but Google also has a large data set of email messages and attachments sent between iCloud and gmail accounts that they could somehow use to correlate activity between their gmail servers and the “iCloud on Google cloud” servers.

mygo · on Feb 26, 2018

Even with a salt you can still tell if a password is reused. You just have to use the same salt.

wisenull · on Feb 27, 2018

If you are using the same salt on multiple passwords you are not doing it properly. It is supposed to be almost a nonce. 1 random salt per password.

mygo · on Feb 28, 2018

That is the standard way of doing it, yes. However not every non-standard implementation is insecure.

snuxoll · on Feb 26, 2018

The encryption is to protect data from snooping by the cloud provider Apple hosts the chunks on, not provide end-to-end privacy/anonymity to users from courts or Apple itself. Apple has and WILL comply with subpoena's to gain access to the contents of an iCloud account (sans Keychain where Apple has literally no way of decrypting the data themselves), as such you should never assume data in iCloud is safe from a DA or prosecutor.

dunham · on Feb 26, 2018

It leaks this information if you use the same key for all users. It's not clear that Apple is doing this. There is some benefit to convergent encryption with a different key per user (e.g. you dedup across multiple backups from the same user).

tombert · on Feb 26, 2018

This is why I like to GPG encrypt my stuff before I upload it to Dropbox. This way, it's pretty much guaranteed not to register as something easy to correlate against, and I can use Dropbox as just a free file store.

I assume some other people do this as well.

happyopossum · on Feb 26, 2018

Data is broken in to chunks, then encrypted, then stored. Given that the AWS keys vary by user, and it’s not done file-by-file, any de-duping would be entirely coincidence and not leak anything.

discussedbefore · on Feb 26, 2018

https://news.ycombinator.com/item?id=16427557#16428434 (2 days ago)

kochthesecond · on Feb 26, 2018

The checksums might be scoped per user in some way, or they are quite brave

lekevicius · on Feb 26, 2018

[Apple] Pie in the Sky for Cloud platform. Clever name, looks like clever tech.

oil7abibi · on Feb 26, 2018

Idk, seems like that would be a pretty standard abstraction layer to implement at a large company.

ksec · on Feb 26, 2018

Sorry if this is an naive question. If every large companies are doing it, why aren't there a standard open source abstraction library available?

cobbzilla · on Feb 26, 2018

I used Jclouds [1] a few years back to do this. While it's very powerful, it's also fairly complex (abstracts not just storage but also compute/etc). At the time, I remember wanting a simpler abstraction layer over it, and ended up building my own. Maybe something like that exists now, I'm not sure.

The devil is in the details of how each provider offers vendor-specific things and you want to take advantage of them. For example Reduced-Redundancy storage is iirc an Amazon-specific offering, or if others offer it, it's probably under different SLA terms/measurements. This rapidly breaks many generic abstractions; maybe this is why everyone ends up writing their own little shim layer for their situation.

In some sense it reminds me a bit of building database-connection-pools in the 90's, before they were really standardized everyone rolled their own and learned all the awful lessons about reference counting along the way. Then along came ODBC, then JDBC, and things were so much easier because you only had to deal with one API, and the databases would conform to their side of it. So I think, isn't that what OpenStack (or something?) is supposed to be for cloud services? But whoa, the depth and complexity of these services far exceeds that of a 90's database. It will take a while -- but over time and with patterns of common use well established, a stable base of standard APIs will abstract away most differences, making things so much nicer. I can dream.

[1] https://jclouds.apache.org/

tyingq · on Feb 26, 2018

The S3 API signature seems to be the defacto abstraction. Network appliance supports it for their (on premisis) hardware. Minio supports it in their open source object storage software. The API signature seems to support competition via tags[1] and other features. Not feeling a lot of lock-in specific to S3.

[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/object-taggi...

rsync · on Feb 26, 2018

"Sorry if this is an naive question. If every large companies are doing it, why aren't there a standard open source abstraction library available?"

I think Tahoe LAFS is the open source solution for this.

https://tahoe-lafs.org/trac/tahoe-lafs

https://en.wikipedia.org/wiki/Tahoe-LAFS

squeaky-clean · on Feb 26, 2018

There are probably 10 already in existence, but they don't meet some exact legal or technical requirements that these companies demand.

cat199 · on Feb 26, 2018

https://medium.com/@anthonypjshaw/multi-cloud-what-are-the-o...

tempay · on Feb 26, 2018

Often the reason to open source is to get developer mindshare (tensorflow) or in the hope of gaining traction to compete against existing closed source tech (opencompute).

duxup · on Feb 26, 2018

Yeah I don't know how many companies do it, but it seems like a process that in some form is a good best practice.

EpicEng · on Feb 26, 2018

The problem there being that small companies get fooled into thinking they need it and spend a bunch of time abstracting away (and complicating!) a service layer they'll likely never change.

toephu2 · on Feb 26, 2018

Yeah, as if you could whip this up in an afternoon.

zodPod · on Feb 26, 2018

I didn't realize that was a requirement for something being standard...

exelius · on Feb 26, 2018

Every fortune 50 company has a similar system at this point; you tie it in with your enterprise architecture platform so you can go directly from design to infrastructure using tested patterns.

Pretty much table stakes in large enterprise at this point; and as usual IBM / SAP / Oracle / SoftwareAG cabal have bought up everything that matters.

oliTwist23 · on Feb 26, 2018

There used to be a project called Geneva. Not sure if this is same.

smrtinsert · on Feb 27, 2018

It's important to abstract it away so that the CEO can flip a switch and suddenly they're at a new cloud provider.

Don't make me laugh.