> Each file is broken into chunks and encrypted by iCloud using AES-128 and a key derived from each chunk’s contents that utilizes SHA-256. The keys and the file’s metadata are stored by Apple in the user’s iCloud account. The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
From what I’ve heard, Apple’s services run on their own cloud platform called Pie [0]. It sounds like this platform probably abstracts away whatever storage service is used, allowing Apple to use whatever fits their requirements.
I think this is just coincidence though. Apple most likely performs their own layer of encryption on top of what Google already does.
They protect against two different attack scenarios: Google doesn't want plaintext data to show up on disks. Apple doesn't want plaintext data to be accessible by anyone at Google (or any other 3rd party provider).
The encryption technique sounds like convergent encryption.
Convergent encryption is a fairly standard way storing encrypted data that might duplicate across users. It allows for deduping of data while in an encrypted state.
Doesn’t this in some way betray the encryption? For example, if multiple users had illegal.csv in their accounts, getting access to one user will allow you to prosecute any of them without decrypting their data because they will all share the same sha?
It is (as the Wikipedia article points out). It's basically equivalent to storing passwords hashed without a salt, it makes makes it obvious when a password is reused, or in this case the same file stored twice.
That means that the provider always knows if a file is shared across multiple users (even if they don't know what the file is) and given a cleartext file they can always check if somebody has it stored on the service. It's not ideal at all if you want good privacy.
SpiderOak is explicitly supposed not to do that[1] to avoid these issues, however I refuse to recommend them until they finally decide to release an open source client and 2FA support so caveat emptor.
Anyway, in this case it's irrelevant because Apple has access to the keys anyway, it's only supposed to prevent the third party (Google Cloud in this case) from having access to the files.
It should be pointed out that this "security flaw" is the only way these cloud file storage platforms are cost effective. Since it allows the same anonymous blob stored by several dozen users to only take up one unit of space instead of one unit per user.
This is particularly space saving if a particular file type was chunked cleverly so that static parts of the file's structure were stored away from dynamic parts (Microsoft's Office XML formats for example could definitely be split this way).
I don't believe that. Apple charges $10/mo for 1TB of data. That's $120/yr. You can get 1TB drives commercially for under $50 these days, you'd need a bit more than 1 drive per user for reliability & sharding, but still. Not de-duplicating the data would clearly be profitable anyway.
Hetzner charges less than half of that for their storage solution, and that's without de-duplication[1].
Does decreasing the security of their users & not offering end-to-end encryption save them even more money? Sure, but I don't see how it couldn't be profitable without it, seeing as you can easily buy non-de-duplicated cloud storage for way less from other providers.
Just look at the pricing for Amazon S3. It cost 20-30$/month * TB. This makes Apples 10$ a month for 1TB looks like a fairly good deal.
Don't forget that the price includes VAT and other taxes. General infrastructure costs, including traffic. Redundancy. Backups. I don't think they're turning much of a profit, but offer it because it makes their platform more attractive.
I'm sure Apple makes plenty of money from users that don't fill up their space to pay for the ones that do.
I have a 2TB family plan that I'm paying $10/mo for, a total of 135GB is used. Most of that usage is from iCloud Photo Library, and while I could use the 200GB plan I'd rather just pay the extra to have effectively unlimited space and not worry about needing to upgrade the plan, especially as I record video in 4K by default now.
I'm sure I'm not the only person who does this, $10/mo is probably nothing to the vast majority of people on HN compared to the hassle of worrying about whether we have enough storage or not.
iOS is pretty good about nudging you to the next tier as you approach filling your current one, IIRC. When I recently had to jump from 200GB plan to 2TB it was one tap on a notification to move to next tier for my whole family, so even for those of us who can easily afford it's arguably not that sensible a strategy to jump straight to 2TB, unless one's goal is to subsidize Apple as much as you can.
HEVC/H.265 introduced in iOS 11 largely makes shooting 4K on an iOS device cost _roughly_ the same in storage terms as 1080p/H.264 did, so recording in 4K doesn't really move the storage needle here like it used to either, assuming of course you have an iPhone 7 or newer.
I read this comment and was confused because it is exactly what I would have written and thought I had a moment of amnesia where I forgot that I posted a response...
Same boat 100%... Family sharing, 2TB plan, have my wife and mother in the family. Combined we use less than 200GB, but I am happy to pay for more storage rather than deal with tech support calls from them two when we go over 200GB and they get error messages popping up. :P
Apple includes their own office suite for free with all macs so if you don’t need 100% office compatibility why give MS more money? I haven’t had or used office in some years now and don’t miss it a bit.
Absolutely. I was just pointing out that the price is in the better end of the scale, especially to end consumers only buying a relatively small amount of storage.
It is from the standpoint of the average user that doesn't want to hassle with figuring out what backup provider to use, figure out what software they need to install, or wants to share the data automatically between multiple devices natively.
As the most popular comment here said when Dropbox was announced, "1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software."
Personally, I'm using seafile on a server, it's cheaper than iCloud or Dropbox, with more storage, and more custom functionality. Many NAS nowadays support Nextcloud out of the box.
I worked out the math a while ago and it will take over 10 years for a home-built NAS to be more cost-effective than iCloud, and that's only if there are no hardware or disk failures, and doesn't include the additional hassle of being out of Apple's model when it comes to auto-backups and storage.
It depends on what you value. If you value keeping your data at home, and accept the trade off of higher cost and the need to maintain it yourself and use some sort of sync mechanism that isn't built into the OS, then you do that.
Yes, you are obviously not their target market. Most people don't know what a NAS is, or what FTP is. With iCloud they toggle a switch and it Happens Like Magic.
The average person can figure out a car, can figure out how to vote in complicated voting systems, and can figure out much more.
You don’t have to dumb down things unnecessarily. If a person can buy a smart home device, plug it in and configure it, or an Apple TV, or a computer, then they can also get one of the simpler NAS, and sync with that.
It was a naive and ridiculous comment then, and it's neither more informed nor less ridiculous now.
(Which is kind of a shame because a truly bulletproof, user-friendly NAS might be an interesting product. Who is going to be the Apple to Synology's Microsoft?)
I find this hard to believe. I get that it reduces space used _in thoery_, but do you have any data on the % of uploads that are duplicated across clients?
It would have to be very, very large to justify the claim that it turns a previously non-cost-effective service into a cost-effective one, especially as prices have been dropping for some time and would hypothetically have simply not dropped as fast if this was the case.
* Offer to charge less usage if a duplicate block is detected, but allow users to pay full price for privately-salted storage. This is similar in principle to how Data Saver works on your mobile device (opt-in to allow a man-in-the-middle to compress or downsample your data)
I thought they same reading this thread, but there's two huge problems with that:
a) Once you know the chunk size, you can determine based on pricing whether another customer has that data, which can have huge privacy implications.
E.g. let's say we both work at the same company and get the same salary statement PDF aside from the dollar number & your name (which I know).
I can simply brute-force craft a file that changes that number around and upload it to iCloud, when I stop paying for storage I know I've cracked what's on your drive.
In any case, I'd be surprised if Apple's not already leaking this information due to caching in a way that could be revealed via timing attacks.
b) It'll lead to hugely erratic pricing for consumers. E.g. let's say you download 100TB of movies from BitTorrent, now you pay almost nothing for it, but if everyone else deletes their copies pricing for you will go up.
Apple could mitigate that by never raising the price on a given chunk, but that just leaves them paying for it, and it's easily abused. Open two accounts, upload the same data, then delete it from one account, pay 1/2 for storage.
That’s a great solution, but at what point is it easier to just store your data yourself? If someone worried about their provider knowing some user has the same file as another user, that person shouldn’t really be trusting Google or Apple with anything.
”in this case it's irrelevant because Apple has access to the keys anyway”
Not completely, I would think. Google may be able to discover whether any user stores file F in iCloud by creating an iCloud account for themselves and uploading the file to it. If that doesn’t create a new file, it already was there before. Depending on what exactly Apple stores on iCloud, they may even be able to detect how many users or (unlikely) even which users store the file.
I don’t see how they could use it, and don’t think they would use it, but Google also has a large data set of email messages and attachments sent between iCloud and gmail accounts that they could somehow use to correlate activity between their gmail servers and the “iCloud on Google cloud” servers.
The encryption is to protect data from snooping by the cloud provider Apple hosts the chunks on, not provide end-to-end privacy/anonymity to users from courts or Apple itself. Apple has and WILL comply with subpoena's to gain access to the contents of an iCloud account (sans Keychain where Apple has literally no way of decrypting the data themselves), as such you should never assume data in iCloud is safe from a DA or prosecutor.
It leaks this information if you use the same key for all users. It's not clear that Apple is doing this. There is some benefit to convergent encryption with a different key per user (e.g. you dedup across multiple backups from the same user).
This is why I like to GPG encrypt my stuff before I upload it to Dropbox. This way, it's pretty much guaranteed not to register as something easy to correlate against, and I can use Dropbox as just a free file store.
Data is broken in to chunks, then encrypted, then stored. Given that the AWS keys vary by user, and it’s not done file-by-file, any de-duping would be entirely coincidence and not leak anything.
I used Jclouds [1] a few years back to do this. While it's very powerful, it's also fairly complex (abstracts not just storage but also compute/etc). At the time, I remember wanting a simpler abstraction layer over it, and ended up building my own. Maybe something like that exists now, I'm not sure.
The devil is in the details of how each provider offers vendor-specific things and you want to take advantage of them. For example Reduced-Redundancy storage
is iirc an Amazon-specific offering, or if others offer it, it's probably under different SLA terms/measurements. This rapidly breaks many generic abstractions; maybe this is why everyone ends up writing their own little shim layer for their situation.
In some sense it reminds me a bit of building database-connection-pools in the 90's, before they were really standardized everyone rolled their own and learned all the awful lessons about reference counting along the way. Then along came ODBC, then JDBC, and things were so much easier because you only had to deal with one API, and the databases would conform to their side of it. So I think, isn't that what OpenStack (or something?) is supposed to be for cloud services? But whoa, the depth and complexity of these services far exceeds that of a 90's database. It will take a while -- but over time and with patterns of common use well established, a stable base of standard APIs will abstract away most differences, making things so much nicer. I can dream.
The S3 API signature seems to be the defacto abstraction. Network appliance supports it for their (on premisis) hardware. Minio supports it in their open source object storage software. The API signature seems to support competition via tags[1] and other features. Not feeling a lot of lock-in specific to S3.
Often the reason to open source is to get developer mindshare (tensorflow) or in the hope of gaining traction to compete against existing closed source tech (opencompute).
The problem there being that small companies get fooled into thinking they need it and spend a bunch of time abstracting away (and complicating!) a service layer they'll likely never change.
Every fortune 50 company has a similar system at this point; you tie it in with your enterprise architecture platform so you can go directly from design to infrastructure using tested patterns.
Pretty much table stakes in large enterprise at this point; and as usual IBM / SAP / Oracle / SoftwareAG cabal have bought up everything that matters.
Apple runs absolutely ludicrous amounts of its own storage. The idea that they use GCP for 100% of anything is insane.
Saying they “use GCP for iCloud” sounds intentionally misleading... there’s a missing qualifier there. But I guess “uses GCP for some stuff in iCloud” isn’t nearly as click-worthy I guess.
Considering that the original document linked to in the article also explicitly mentions S3, I don't think anyone is lead to believe they use GCP for everything.
Well considering the amount of people in just this HN thread who didnt read the original document linked, it's almost certainly guaranteed people will take that impression from the headline and poorly worded article they'll read on Gizmodo or whatever.
Apple definitely does not use GCP for 100% of anything, but it could be done. A migration like that couldn't happen overnight anyway, as I'm sure you know.
Google hit the first 1EB of raw disk space a long time ago. Or you could look at power consumption as a proximate for capacity.
Power consumption is probably not a good proximate for disk capacity. Dense disk capacity has way lower power per rack than compute. Unless you know the overall compute/storage ratio, very hard to figure out storage capacity. The reverse might work if compute power consumption overwhelms storage power consumption. On the other hand, it might be easier to infer storage than comoute from rack positions, or square feet of dc space. Only so many spinning disks per cubic meter.
Also, there's a Reinvent talk from several years ago with "8 exabytes" in the title, so surely Google and Amazon both have many exabytes by this point.
(it has other useful information that might not have been mentioned anywhere before, like the crazy Colossus on Colossus... or D, the GFS chunkserver replacement)
You can use that to revisit Randall Munroe's estimates: https://what-if.xkcd.com/63/ You can infer from some of the comments that e.g. very dense disk capacity is not a good idea. There's more, of course, but I can't go into details (ex-Googler).
The 'insane' part is that Apple are supposed to be the hardware manufacturer and should therefore have cheaper costs than a supplier that is not primarily a hardware manufacturer. Clearly there is more to the storage game than hardware. However, if Apple can make their own chips, can make their own computers, can make their own operating system and even their own cloud storage, it is odd that they rent out someone else's x64-x86 rigs.
Apple took a long time to get this cloud thing right and had to play catch-up. No wonder they flew a helicopter over Google's nearby data center when they were working on their first in NC.
That's actually exactly the opposite of what Netflix (I think) was saying. Netflix point was that basically Cloud provider are very good as long as you are a regular user (which can still be massive volume)
Snowflake customer requires too much custom feature and that lead to less competitive prices. When you reach the point you pay Amazon/Google/Apple as solution provider (think SAP, IBM) you need to bite the bullet and build the expertise in-house to avoid unhealthy ties between your and your provider business.
I'm familiar with the kind of amounts of storage (in terms of cabinets, # of arrays) for various reasons, and I assumed it was the iCloud storage because of how vast it is. On the other hand, with the high-end gear they're using, likely the cost/GB is insane and more than Google or AWS charges them. I can't imagine the big cloud providers are buying top-end arrays, but I could be wrong.
>> The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
There must a misunderstanding. I was asking supporting evidence that Apple ran its own storage. This is evidence that Apple is using storage service from other companies.
Since Steve Jobs returned to Apple and Tim Cook took over operation, Apple has always been running "Asset Light" approach. The less asset the better. They value flexibility over the little benefits of owning those asset. Hence why they were very late into building their own DC, the also view the Cash they have as another asset they dont want too much. [1]
Apple also view Data Storage, as another aspect that depreciate quickly and offer no strategic advantage to owning all of it.
Ever Since Apple merged and moved all(?) of its Cloud operation using Mesos, things has been great every since. Last time they said it was the world largest Mesos cluster running in operation. I suspect it is even bigger by now.
but how do we reconcile Apple's strategy with the fact that Amazon did invest in an "asset heavy" approach? i have been under the impression that Amazon makes a substantial percentage of its profits from S3 and other cloud services.
I am not saying I agree with Apple's Asset Less approach, just pointing it out. I would actually want them to have more DC, and more Apple Store around the world. All should be buying them outright rather then rent.
But then their Apple TV is completely opposite of their Asset Less approach, as they decide to spend more money to create their own TV Asset.
I thought this was the case. I was sharing mobile data over wifi to several iPhone users with some basic data use monitoring so that I could firewall services that used unreasonable amounts of data.
One of them, despite having all the automatic update and backup features we could find turned off regularly attempted to upload gigabytes of data to multiple IP addresses that resolved to Google Cloud. Since there were fairly few apps installed (none of them suspicious) and a large number of photos and videos on the device, my conclusion is that it was a spurious iCloud photo backup.
Unfortunately, iOS does not appear to provide a way to see what's using data on Wifi, only mobile, nor to designate a Wifi network as metered.
Considering the technology Apple uses for spanning cloud providers, this seems unlikely. I would imagine they would proxy all of this data and you wouldn't see direct accessing of Google Cloud.
It could just be using iMessage. When you send photo and video attachments using iMessage, they get stored on iCloud for some period of time so that other users can view or download them.
How is this news? Years ago I’ve observed system processes like nsurlsessiond and cloudd connecting to AWS, GCP, as well as Apple’s own infrastructure for my iCloud storage. It’s really obvious for someone who’s using Little Snitch and suddenly those processes start connecting to a new/different hostname.
Interesting that Dropbox is saving a lot of money chopping off AWS while Apple relies on Google for exactly the same thing. I would think that Apple would host and build an inhouse solution, really curious what the reasoning is to go with GCP.
That would be because Dropbox is a SaaS software business where the TCO of its infrastructure is absolutely critical to its P&L. Every dollar shaved off its infrastructure per user is a huge deal for Dropbox.
Apple uses software / cloud services as a way of selling expensive hardware, and the margin (if any) made on its cloud services is a rounding error on the overall P&L.
EDIT: Generally, the 'Apple insources everything' meme is a massive oversimplification. Apple insources when there is a competitive advantage to doing so. Apple insources its chip development because it is able to get better thread performance on its phones when the software is aligned tightly to the chipset. Apple insources Cobalt because you need Cobalt to make smartphones and if others find it difficult to source Cobalt, they're going to find it hard to make smartphones, driving up their unit costs. It gets no such advantage from insourcing compute bar a slightly lower TCO, which isn't going to be a huge deal to them anyway.
HN dogma says "start with doing unscalable things". Fix what becomes a real problem, in the order that they do. With Apple's money I imagine paying a bit for GCP remains a minor hassle for a long time.
The difference is that data storage is Dropbox's business, while Apple's business is selling devices. Yes, they do have a variety of income sources, but they are all built around and integrated into selling iThings and MacThings. They go with what they are expert in. Reliable and available around the world storage requires unique and difficult to learn skills that don't map well to OS design and pc hardware architecture.
On the note of data storage, it's frustrating that I can have photos and videos -- that I intend to keep for the next 50+ years -- spread out between 2 to 4 services (iCloud, Dropbox, OneDrive, Google Photos) just by owning a couple phones and a computer or two, which I imagine is pretty typical. I know there's a start-up idea in here somewhere, because I personally would pay premium to consolidate all those into one organized location, but I don't exactly know what that is. I know Dropbox has an option to import data from other services, but in my experience it hasn't been as "smart", automated, and thorough as I'd like it to be.
Once the baby videos started rolling in all the cloud solutions started getting expensive real fast. I had to stoop down to running my own file server. Buy a pair of drives every now and then to expand, and nightly rsync to keep things backed up.
The product I would like is a photo album with an indexing and web interface components, and plugins to make sense of different phone platforms. Nannies love to text videos and getting those off the phone has been a pain; I end up saving the whole iOS backup and scraping any media looking files from it.
I'm also the same way. I try to keep everything, because I can, and I'm also now storing baby videos.
I have a 2TB iCloud plan for my iPhone X. The phone is set to "Optimize Storage", so it will de-scale old "local" pictures once I hit ~256GB.
In the background, I also have the free and unlimited version of Google Photos running. It automatically uploads everything, including videos. Videos will be scaled to 16 MP / 1080p, but I'm not complaining since it's free.
I haven't hit 2TB yet, but so far it works. I also have a 20TB Synology NAS setup, but have little desire to run my own photo storage/hosting. The Photos app on my MacBook Pro also is set to download everything via iCloud Photo Library, so that's also backed up to the NAS via Time Machine.
Running my own server and writing a script to scrape files or even doing it manually is too much work for me, I just don't have the time to dedicate to something like that. That's why I'd pay premium for a service that automates this for me.
Up to two TB now at 3yro kid. Part of the problem is the video resolutions grew a lot, and having to snapshot the whole phone whenever it fills up. (Though I did some deduping and the redundancy is not as high as I thought...)
I doubt my usage is an outlier. But I suspect that I keep more video than most people just because I can.
I do the same thing, although my data growth seems to be a bit higher. My kid is less than a year old and I'm on pace to generate about 2-2.5 TiB for the first year. (Mostly 4k video.)
At some point I might start thinking of re-encoding the originals, but so far storage is cheap to have locally, and I'm going to see how long I can keep this up.
Why did owning multiple devices result in your photos being spread across multiple services? Other than iCloud they're all cross platform. They all offer easy ways to download your entire library, is there a problem with downloading everything and reuploading to a different service? Also if you really want to keep them for 50+ years I think you'd be crazy not to have at least one on-site copy and one off-site copy.
>They all offer easy ways to download your entire library,
Frequently there are proprietary things that aren't included in the library. For example, if you become dependent on Google Photos ability to recognize faces and objects, you tend to stop bothering to tag photos manually so when you export everything all you have is a big pile of unorganized photos.
Personally, I'd prefer to have things a little spread out. If you end up with account problems at one provider you still have other copies. (I'm thinking of scenarios like "mistaken DMCA takedown" or "account mistakenly flagged as fraudulent" or weird things like that.)
It appears they derive the chunk encryption key from the chunk content, so dedupe will be possible if they don't randomize the IV or nonce ([0], page 53). The guide also mention convergent encryption, which would confirm this. Personally, I find it sketchy privacy wise.
These types of busy-work machinations are often done by cloud providers around encryption. The purpose is apparently to say they use ciphers but also keep the keys in order for decrypting data.
Cloudkit and iCloudDrive use Account Keys, but it isn't clear if those are key encryption keys or data encryption keys. It also isn't clear if those are protected from information only from the client, or if the cloud is able to freely read them. The differences are massive in regards to privacy, and this document really doesn't have the needed technical information to make an informed decision.
Does GCP GCS really autodedup? I haven't seen anything in their documentation that suggests they do, and generally deduplication is known to have unavoidable performance characteristics, especially if you're in a very distributed environment.
I am not sure what you mean with pennies, they charge the same as AWS, and Dropbox just showed that doing everything inhouse saves them millions of $$.
What is the measure you are using for Dropbox being one of the largest companies in the world?
In the beginning of 2017 Dropbox claimed a $1 Billion dollar run rate, however at the end of 2017 Dropbox had a net loss of $110 Million. When your entire year is a loss of $110 Million, saving $75 Million is very important to the companies survival.
We can't compare companies with each other since we have no idea what kind of deals they have negotiated with the cloud vendors. On Apple scale, you most certainly are not paying the list prices. Quite certainly the same thing for Dropbox, but Dropbox might have had worse negotiation position. Initially, they started small and then were quite locked into AWS. Maybe AWS did not believe they would actually be able to move away.
Since Google is working to challenge AWS, it probably makes sense for them to strike deals with companies like Apple with even small profit margins if this increases their efficiency and allows them to then make more profit from other customers.
And then we also have the option that Apple simply can't build own data centers fast enough to accommodate all their needs. Encrypted storage blocks are a safe thing to distribute, so rather distribute those than some Siri voice recognition stuff.
Dropbox and Apple are completely different businesses with different business modules. If Dropbox suffers downtime, they lose revenue. If Apple suffers downtime, they can point at a cloud provider, while they keep selling their products.
(In case they someday want to drop the business. They don't have to worry about shutting off their own capital investment - all they have to do is tell Google to turn it off).
It's not an all-or-nothing proposition. Apple probably uses DCs for the majority of their storage, with the option to "burst" to AWS and/or Google, while they continue to build additional in-house capacity.
I've seen (via Little Snitch) at different times, Apple Photo has been downloading data from Amazon, at&t cloud (?) and Azure — at least it's what I allowed it to use up until now. I didn't see Google yet, probably, it's recent development.
I thought that's what their DCs were for. It definitely seems a bit out of character for a company that designs its own silicon to outsource something so comparatively trivial.
Outsource the easy things so you can focus on the hard things - custom silicon provides a lot more unique value for Apple customers than commodity cloud storage.
It makes sense for Amazon, Google, and Microsoft to build out datacenters because they're both using them for their core business and selling compute and storage resources to third-parties.
Unless Apple intends to get into the cloud computing business, eating the cost of tooling up a datacenter puts a big construction and maintenance cost on the "Liabilities" side of the T-sheet that they can avoid via a smaller rental cost.
I'm know how to change the oil on my car, it would quickly become cheaper for me to get the ramps and the oil catch pan and such instead of pay somebody else to do it, but I'm still never going to do it. I don't want to keep that stuff in my house, I don't value the extra few dollars I pay for somebody else to handle it. The fixed costs to save some money in the long run are irrelevant because I don't want to be handling any of it to begin with.
Now, if I had a fleet of cars, the calculus may change on that.
Yeah, sorry; I said "real estate" when what I should probably have said was "Physical ownership and plant maintenance."
Owning and operating a datacenter is a whole kettle of fish that they don't want to get into if they don't need to. The real estate taxes (wherever they set it up) will matter, but the larger costs are likely to be in day-to-day operation, upgrading and maintaining hardware (and the entire process for that), dealing with actual natural disasters like a flood (or a bird flying into a transformer house and blowing power to N% of the datacenter, which means they need a backup generator, which means they need to test and maintain the backup generator, etc., etc.).
There's definitely a break-point where it's cheaper to pay someone else (in Apple's case, multiple someone-else's) to deal with that hassle.
It takes about 3 years to build a DataCenter. Also depending on the type of campus you want to build, the availability of power and contiguous land is fairly limited. Additionally most web scale DCs are "passively" cooled and require the local environment to support this.
We are in a construction boom nationally and it's currently difficult to get enough people to construct DCs. The ability to scale out is limited by construction.
Apple is building DCs, they are late to the game, but they probably can't scale their infra fast enough and need to use cloud services as a way to keep scaling without impacting their customers.
That definitely depends on where and which kind of data center. If you're willing to take a derelict warehouse and put in containers like Google did, all you need to do is providing power and fiber (which should be plenty in any industrial settings) and you're set in a matter of weeks to months.
If you're aiming higher, as in design a DC, buy the ground, build the building itself and then installing all the stuff needed, you're in for much more money and time. Depending on the local politics and laws as well as power/fiber infrastructure, you can cut some corners but the worst-case (uncooperating politicians that need to be brought in line by the courts, the next 110kV transformer being at capacity, no fiber and no empty tubes in the ground which means digging yourself) is the benchmark there.
Google has many large facilities that are definitely not shipping containers. You can do this for caching sites but it doesn't really work all that well from a cooling efficiency POV. In 2009 when they were building shipping containers their PUE was as high as 1.22 which is essentially not that much better than traditional raised floor and air conditioners. Their modern DCS are not built like this. Instead they have hot aisle and cold aisles. They call their cold aisles hot huts.
So you essentially trade off time to market for PUE.
In a large compute region the higher PUE of the shipping container approach is going to limit how many servers you can put in that location because power to a particular site is often a limiting factor. Also not everywhere do you have access to water from the Columbia river.
> "...the next 110kV transformer being at capacity"
They will mostly likely work with the local utility to build an entire substation for the DC, so they won't have to deal with sharing capacity on an existing sub's XFMR.
I'd imagine that the lead time on getting the substation equipment is probably the longest for all of equipment inside a DC. Chillers, UPSs, generators, and breakers are probably not in as high demand or take as long to manufacture as a MV XFMR.
Technically, almost no one builds their own datacenters at this point. The new stuff is built by wholesalers like Dupont Fabros, and companies like Apple sign 20 year leases to be the sole tenant. Same goes for Facebook.
The tenant dictates the location and design, but DFT builds and runs it. 15 years ago, everyone large wanted into the datacenter game. Then they realized it sucks to own a lot of land all over and employing mainly security guards and janitors there isn't worth it. Wholesale DC to the rescue.
And yeah, I know that Digital Realty bought Dupont.
Yeah, no. With very few exceptions FB builds and owns their own datacenters.
Basically, anything you read about in the news is entirely self-owned and operated. Yes, even employing security guards. E.g. this http://money.cnn.com/2017/08/15/technology/facebook-ohio-dat...
Same goes for Google and Amazon. I'm sure each of these companies may lease out space from another colo for their edge computing needs, but the are primarily using DCs they built, own, operate, and staff.
They lease in carrier hotels from people like Equinix, Interxion, and Coresite for edge connections and peering.
They also buy on the wholesale market taking over entire buildings and campuses. DFT had a great business model. Leasing out entire campuses to hyperscale players on 30 year leases before even breaking ground. Pre-determined interest rates means you know exactly how much you can charge per CRSF and make a good living. Honestly, I was pretty pissed when DRT bought DFT.
No, yeah. Dupont said that 15% of their revenues came from Facebook. It was listed as a risk in their annual report.
FB and et al do still build their own DC's. Especially where municipalities drop their pants and offer huge tax breaks. But there's still a tremendous amount of "single tenant" purchasing done at the wholesale level by the hyperscale players.
I guess my point is that even at the top tier, people value flexibility.
Couldn't tell this driving around Loudoun County, VA where I live. Multiple large scale data center developments on going and it seems a new one gets kicked off every couple of weeks. A new 750,000 sq ft data center complex was just approved at the end of January and initial delivery is expected this fall. Such a quick delivery seems to indicate there is plenty of construction resources available.
Based on reports from people I know: Apple already has a great deal of stuff in the NoVA datacenter cluster near Dulles and will most likely not spend any money to put more there. They need to be geographically distributed for latency and reliability reasons.
There is a middle ground between the two extremes of building your own datacenter from bare land, and using GCP or Azure or similar to host everything. Using WA and Oregon states as an example there's plenty of datacenter companies that will rent you "powered shell" space with cooling to do whatever you want with.
There are plenty of datacenter-owning real estate investment trusts that build huge facilities expressly for the purpose of turning around and immediately subleasing space to large third parties. With facilities that are already built and online this can reduce your 3-year timespan to a couple of months from contract execution to thousands of live servers, if the people doing the contracts and technical specs are sharp.
For a company the size of Apple, if they do have intentions to 100% own their datacenters, this sort of thing would be a stopgap solution. But it's a possibility.
Google, Facebook, Apple, Salesforce are listed as customers of large wholesalers like DFT. Like you mentioned, it's easier to get someone else to build and run the building (to your specs). Plus there is some anonymity too.
It adds 20 and 30 year lease stability to DRT's portfolio. DFT still operates under the DFT name for now. Just like telx which was also a DRT acquisition.
Reliably storing a lot of data for a long time with low latency and high bandwidth worldwide requires fairly specialized hardware and talent. If you're not willing/able to invest in the proper hardware (which is very different from regular compute work), it's definitely better to hand it off to a 3rd party.
Looking at BackBlaze's posts about designing rack-mounted dense storage units and their reliability tests is very informative on how specialized it really can be.
And yet you'd think they have the numbers to warrant investing in it; Dropbox did too, and I read it saved them 75 million since moving to their own datacenters.
With the pile of cash they're sitting on, it may be worth their while to pay more for storage rather than having to try to staff up their own infrastructure. They've plenty of money - more than they can figure out what to do with, it seems - but hiring's harder.
$75 million isn't a lot, at Apple's scale. Especially since the entry cost could easily scale up into the billions (world-wide, high bandwidth, lots of storage).
This is a swag, but I imagine more content is stored on Dropbox than iCloud, potentially lowering the savings even more.
Nah it's all fairly off the shelf at this point. You can run Ceph on top of server designs from Quanta and already be at near parity with S3 under your own roof. You just have to want to do it; something I bet Apple doesn't care about yet
Backblaze only has one data center. The latency is high if you are too far away from it. Which is fine for a backup solution. If that data center gets hit by a meteor -- you're toast. Again that's fine for a backup, especially if you actually implement the 3-2-1 back up strategy that they are always recommending.
I don't think Apple has the kind of technical know-how's that Google has. Google develops their own tools to support their infrastructure; Apple is narrowly focused on a few areas in product/services, namely mobile and desktop, only and buys off the shelf-ware for everything else, from storage (EMC Isilon), to OS (Redhat Linux), to orchestration/mgmt (Mesos), etc, etc...
Apple can hire the engineers to build an object storage system, but to build one that can scale takes at least a couple years. If Dropbox can move away from S3 with much less capital, so can Apple. Apple doesn't want to invest in that effort at all.
So does this mean kubernetes will win? :) Why does it matter if Apple uses GCP? I think it's a terrific product and wish I could use it more. I also don't think apple and google are competitors really in the 'cloud' space. It'd be stupid to go to AWS if you were as large as Apple as companies like Apple can actually have leverage I imagine.
So from the anecdotes in this thread, along with some people suggesting Apple use AWS/GCP/Azure for things....
I'm wondering if this is a capacity thing, or maybe a locality thing? It would make sense to use the cloud providers if you don't have a data centre in a nearby region of a user I guess
Apple probably needs a datacenter of its own anyway. Just think about all of the data they use internally for projects, and other sensitive stuff they would never allow to leave the network. So building a state of the art DC with low PUE makes sense.
However, farming out some of the costs to other cloud providers seems like a good strategy to eliminate single point of failure, or avoid all data being lost if somehow one provider loses data. And maybe then they can focus on adding compute units rather than storage, and backup for the storage units.
In short, despite Apple's user base, I still don't think it is on the scale of AWS or Google.
Yeah, the linked article feels like someone at cnbc is paid to come up with anything that will get clicks (I wrote that sentence kind of jokingly, but thinking about it now, of course they have exactly that) and it worked great this time. Well played cnbc. You win this round.
Now hopefully there's a Rust vs Go blog posted or something gets merged into systemd today so we can get back to business as usual here ;)
I would guess that nothing happened to them, and that they have never been used to store encrypted iCloud data. Apple has an enormous amount of data that needs to be stored beyond iCloud: iTunes, App Stores, Apple Music, etc. I'm sure they store much of that data in-house.
There is also the possibility that Apple eventually plans to move iCloud to their own storage solution, but hasn't yet scaled up to it yet.
They don't have to relay purely on one solution. It would make sense to host certain parts of their infrastructure on GCP if enough of their customers are also located on GCP.
There should be! Any serious consumer of cloud storage has an abstraction layer that let's you store new data in arbitrary locations on site or across public clouds. Companies frequently write their own middleware connectors.
It is a bit cheesy, yeah? But I do like the simplistic, big buttons with clear pictures. I don't want to have to take an extra second to decipher a material-design pictograph that sacrifices some aspect of understanding for visual aesthetic. Waze gets their users on the value the app brings to people, not how good it looks. There aren't many other players in the field of crowd-sourced traffic data like Waze shows (at least in the US) so I don't see them working on updating the UI until it actually matters.
It's no secret using Little Snitch (or an equivalent network monitor) - it's interesting to see iCloud services connecting to AWS etc. No real surprising considering Spotify etc all do the same, but one would imagine Apple could run their own data infrastructure by now...
This is really not big news. If I were Apple, I would also sort cloud storage vendors by price including building the whole infra in-house, throw away the non secure or risky ones and pick the cheapest.
I am new to deploying services on cloud but I am wondering why Apple mixes cloud services between Amazon, Google, and Azure? Is there advantage over using a single provider? Maybe risk management?
I don't think this is true long term. The cheapness of cloud applies when and where deploying otherwise complicated systems - eg. a hadoop cluster - can be done in matter of clicks, but for something simple - file storage - I'm finding it hard to believe that it's cheaper. However, in case of apple, geo distribution and locality is probably very important, which adds to the cost for sure; I'd still be surprised if long term own/rented hardware wouldn't be cheaper, than SaaS.
Especially for object storage. The scale where deploying your own object storage network gets cheaper than S3 is surprisingly small. Even handling your own equivalent of "intra-zonal redundancy" is easy. The main concerns arise in "how do you CDN that data to your customers" and "how do you gain redundancy beyond intra-zonal"; that's where S3/GCS gets more interesting.
Who is saying they aren't going this avenue? It would make sense to me that they would slowly increase their own infrastructure, testing it's reliability and over time decrease their reliance on third parties.
Apple has enough money to tread carefully and roll things out slowly.
A friend messaged me this morning saying he logged into his work Office 365 account and noticed a lot of the URL's (eg for attachments and such) point to AWS services like S3. If Microsoft themselves use AWS instead of Azure.. wtf?
I actually interviewed for the O365 team and yes, they said they don't trust Azure at all and host everything they can on their own private cloud. Didn't hear anything about S3 but they really didn't trust Azure, wouldn't be surprised.
Though I think their average quality has went down lately (from the days of Excel pcode to the current days of Office 365), I've been seeing Powershell exceptions thrown directly in the UI in the Office 365 Admin...
>For years the document contained language indicating that iCloud services were relying on remote data storage systems from Amazon Web Services, as well as Microsoft's Azure.
>But in the latest version, the Microsoft Azure reference is gone, and in its place is Google Cloud Platform.
To me, this seems to imply that they still use Amazon.
If you read the linked PDF document, it mentions they use AWS as well
> The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
> The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
I wonder if this raises questions about privacy, if you're ultimately going to end up in googles eco-system in a way anyway why would you want middleware?
As a cloud provider I really like GCP compared to others.. it's clean, consistent, predictable and easy to use.
Speaking personally in my employment AWS always sent sales folks and Google always sent engineers, this makes me more comfortable too.
Hosting encrypted data in the Google Cloud is not the same as having the Google ecosystem where you willingly let Google harvest your email/history/location/etc.
Apple does then get Google security. They have found most of the big vulnerabilities including Shellshock, Cloudbleed, Heartbleed, Broadpwn among others.
> Each file is broken into chunks and encrypted by iCloud using AES-128 and a key derived from each chunk’s contents that utilizes SHA-256. The keys and the file’s metadata are stored by Apple in the user’s iCloud account. The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as S3 and Google Cloud Platform.
From what I’ve heard, Apple’s services run on their own cloud platform called Pie [0]. It sounds like this platform probably abstracts away whatever storage service is used, allowing Apple to use whatever fits their requirements.
[0] - https://9to5mac.com/2016/10/06/report-unified-cloud-services...