ZFS on Linux v0.7.0 released

vmp · on July 27, 2017

I use ZoL for my home-server (5x3TB RAIDZ) and couldn't be happier, never had an issue too. The features really spoiled me compared to other filesystems/volume-managers. The only issue that irks me is that I can't resize a pool after it was created, can't add new disks, can't remove one (without risking parity).

Recently I tried running a simple BTRFS SSD Mirror for only a single KVM virtual machine, I thought compression would be neat since they are just two cheap 120GB SSDs and I wanted to have a spare if one of them gave up the ghost.

At first everything was great but after I ran a btrfs scrub, and it found ~4000 uncorrectable errors, my VM was broken beyond repair and had to be restored from backup. The SSDs SMART data was fine, there weren't any loose cables, everything worked great, no reported (ECC) memory errors... I have no proof but it seems that BTRFS just decided to destroy itself.

I have since moved my SSDs to a ZoL mirror and (after running scrubs every two days for two weeks) had no further corruption, silent or otherwise. To me, this means that btrfs just isn't stable enough for production use - while ZoL is.

gregmac · on July 27, 2017

> The only issue that irks me is that I can't resize a pool after it was created, can't add new disks, can't remove one (without risking parity).

This is what eliminates it from home use for me.

It means to expand the pool the only way is to copy everything off (so you need enough spare storage to hold your entire pool), rebuild, then copy it back. The alternative is don't expand, and just add additional pools, but that starts losing benefits quickly (not having to think about where there's free space when adding something, not having to look in multiple locations to find something).

Can anyone who has been using ZFS long-term at home comment? How do you add more space?

lmm · on July 27, 2017

I have a pool of two raidz2s of four disks each, one twice the size of the other; every so often I replace one set of disks with disks that are 4x the size (i.e. I started with 4x250gb + 4x500gb, after a few years I replaced the 250gb disks with 1tb disks, right now I have 4xt2b and 4x4tb - and I get to store half as much data as the total capacity, so 12tb at the moment). If a disk dies close to the time I was thinking of upgrading then I'll replace that disk with one of the "new" size (but can't use the extra space until I do the rest of the replacement).

It works pretty well - by the time I'm buying disks that are 4x as large I don't mind throwing the old disks away. I've definitely avoided data loss in scenarios where I'd previously lost data under linux md (which lacks checksums and handles disks with isolated UREs very poorly).

gruturo · on July 27, 2017

Uh? Let's say you have a simple mirror of 2 vdevs, you go like this:

zpool attach <poolname> <first existing small vdev> <first larger new vdev>

zpool attach <poolname> <second existing small vdev> <second larger new vdev>

[...wait for the resilvering of the new vdevs...] You now have a 4-way mirror with 2 small and 2 large vdevs. Detach the small ones:

zpool detach <poolname> <first old vdev>

zpool detach <poolname> <second old vdev>

Now you have a pool made only of large vdevs. You just give the pool the permission to expand and occupy all this new space:

zpool set autoexpand=on <poolname>

Done. Did it oodles of times on Solaris with SAN storage, but did it at home too, and in the weirdest ways (no SATA ports available? No problem, attach the new drive via USB3, then when finished take it out of the enclosure and install it inside displacing the old drive), sometimes even rather unsafely (creating a pre-broken raidz with 4 good disks plus a sparse file, migrating data off a different pool, then decommissioning that old pool and using one of its disk to replace the sparse file).

detaro · on July 27, 2017

The commonly wished-for feature is not to entirely replace disks as you describe, but to expand it bit by bit. To be able to have e.g. a z2 setup spanning 4 disks of equal size and then add a 5th one later to get one disk's worth of extra capacity.

gruturo · on July 27, 2017

If you mean turning a 4-disk raidz2 into a 5-disk raidz2, indeed that's unfortunately not yet possible (unless you had built it with an extra vdev sparse file and run it in this degraded state - effectively you had a raidz and not a z2, and it only works once).

If you want, though, you can add capacity by attaching a second raidz2 to an existing pool - this forces you to add 4 new drives instead of one, but it works:

zpool add <poolname> raidz2 <first new vdev> <second new vdev> <third new vdev> <fourth new vdev>

You now have a concat of 2 raidz2's, in a single pool.

Nowhere as elegant as merging a new slice in an existing z2, I concur. Does the job though. Oh and if you are insane you could probably even add a single device, resulting in a concat of a z2 + unmirrored single device. I don't think it will stop you from doing that.

tjoff · on July 27, 2017

Does the job though.

Nope.

It will result in very uneven utilization which leads to poor performance. Not recommended. Is also extremely wasteful. The whole point of the sought feature is to minimize waste and cost. So no, I'd strongly disagree that it does the job.

gruturo · on July 27, 2017

I'm not trying to defend ZFS' lack of the ability to add a vdev to an existing RAIDZ/Z2. That would be extremely nice to have and has tons of legitimate uses, for many of which there is no acceptable substitute as you point out.

That said, even if that function existed, expecting it to magically rebalance the whole pool to take into account the existing data is rather unrealistic (it's basically a full rebuild in that case, the most terrifying operation you can perform on any pool/array already containing data).

While I agree with the performance aspects (your data still comes from either the first volume or the second, so you're only using the performance of half your spindles - unless you simultaneously access files in either half), I wouldn't call it wasteful: You are still losing the same % of your total capacity. e.g.:

6x2TB RaidZ2 = 8TB usable (4TB lost to parity)

let's add a second Z2 6x6TB volume:

6x6TB RaidZ2 = 24TB usable (12 TB lost to parity)

Total: 32 TB usable, 16 lost to parity.

What if we had built it from scratch using 6x8TB drives?

Total: 32 TB usable, 16 lost to parity.

Same.

tjoff · on July 28, 2017

That said, even if that function existed, expecting it to magically rebalance the whole pool to take into account the existing data is rather unrealistic

That is exactly what is required (well, you re-balance the vdev, not the pool) and is what is proposed and has been in the works for basically forever. SUN never prioritized this because it has few uses within the enterprise (but extremely valuable for home users).

Your example is quite misleading. Here's is a much more common and realistic scenario.

6x2TB RaidZ2 = 8 usable, 4 lost to parity.

Now, imagine that you want to expand the pool with 4 TB. Being able to add to a raidz vdev would result in this:

8x2TB RaidZ2 = 12 Usable, 4 lost to parity. (ridiculously cheap, no further waste at all)

Current scenario:

6x2TB RaidZ2 + 4x2TB RaidZ2 = 12 Usable, 8 lost to parity. (expensive, lower performance, more power, more noise, requires more harddrive slots (this is often a very important aspect for home-setups))

The waste is outrageous. And the consequences from this waste reflects every aspect of designing a home-setup (as to avoid the above scenario) with ZFS and is vastly different to how you would design a system with, for instance, raid6 with expansion in mind.

tscs37 · on July 27, 2017

The comparison is a bit off.

If you build a RAIDZ2 wiht a 3x8TB and a 3x8TB pool, you loose 2+2x8TB = 32TB and have only 16TB usuable.

6x2+6x6 is a mixed pool so it's hard to compare to actual disks.

An alternative, better, comparison, would be a 6x8TB pool and another 6x8TB pool.

This time you loose 32TB to parity and have 8x8=48TB usuable. If you rebuild it to a 12x8TB pool, you loose 16TB to parity and have 80TB usuable.

Annatar · on July 27, 2017

Expanding bit by bit is extremely fragile, arcanely complex and error prone. Not worth the amount of financial savings, especially when the cost of time and effort is added on top of that.

nske · on July 27, 2017

It might be what you say, but in many cases it might also be a necessity. I built my first array with 10x1TB disks (starting with 3) for my home server when I was a student/got my first job and could only afford one disk every now and then. Linux' expandable mdraid worked like a charm multiple times. I probably wouldn't have bothered (and would have lost data or limited my hoarding) if I had to save for a year just for a fileserver. I am not talking about precious data but data don't have to be irreplaceable to want to protect them.

hks12 · on July 27, 2017

It's much easier to grow if you use stripe of mirrors instead of raidz. You waste more space, but gain more flexibility and performance. More details here: http://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-...

For example, you can start with 1 mirrored vdev, then add another mirrored vdev. You can upgrade vdevs separately, so you'll need to replace just 2 disks to grow your pool.

The only thing you have to keep in mind is that data is striped across vdevs only when written, so if you add another vdev, you won't get performance gains for data which was written just to one vdev.

tscs37 · on July 27, 2017

It's also important to keep in mind that a stripe of mirrors, aka RAID10, has a higher failure probability compared to RAIDZ2 or higher on most common disk setups by several orders of magnitude.

rleigh · on July 27, 2017

You also have to factor in the speed of a resilver and account for the chance of a failure during this at risk window.

A mirror can resilver much faster, and it doesn't significantly affect the vdev performance while doing so. A RAIDZ resilver after a disc replacement can take a significant amount of time, and degrade performance seriously as it thrashes every single disc in the vdev.

Allan Jude and Michael Lucas' books on ZFS have tables describing the tradeoffs of the different possible vdev layouts, and they are worth a read for anyone setting up ZFS storage.

tscs37 · on July 27, 2017

The danger of loosing data on a RAID10 resilver is much higher than people expect. A 6x8TB RAID10 has a 47.3% chance of encountering a URE during a rebuild, which means data loss.

A 6x8TB RAID6 has a 0.0002% chance of a URE.

(Assuming URE rate of 10^-14, in reality this rate is lower)

ScottBurson · on July 27, 2017

There is another way to grow a pool: replace each disk with a larger one, resilvering after each replacement.

But yeah, sizing your pool very generously when first building it is a good idea. I found that 6x 4TB drives in a raidz2 pool was within my budget, and it will take me a long time to fill 16TB.

stavros · on July 27, 2017

What does the gp mean by saying there's no way to resize a pool? You can resize a pool by adding disks like you say, so what's the alternative? Resizing by only adding one disk? But it's RAID-Z, where will the parity go? Does anyone support this use case?

tscs37 · on July 27, 2017

SnapRAID and BTRFS both support adding disks.

On BTRFS you simply use the new space or rebalance the pool to use the new disk properly.

On SnapRAID the next scan will add the disk to the parity drive contents.

For low-cost home usage, it is much much more cost effective to only buy single disks and start with small pools over buying large pools or even replacing entire pools.

michaelmrose · on July 27, 2017

Since I presume you have verifiable backups of your pool couldn't you just remake the pool with the increased number of disks and copy back to it?

tscs37 · on July 27, 2017

Recovering from my current backup solution is expensive, the additional cost is not worth it.

Remaking the entire pool is also a hassle and incurs unnecessary downtime.

Additionally, not all data is backed up, which I will loose, as this is not important data, it's okay to loose during a house fire, but not just for resizing the disk.

Lastly, this operation would likely take a long time, days probably, I'd rather just be able to just ram in another disk and be done with it.

michaelmrose · on July 27, 2017

I had assumed you would have a second array as backup for the current pool ensuring zero data loss and easy backup. This would seem to be optimal. Remote backup is obviously a good thing to have too.

tscs37 · on July 28, 2017

Such a solution is extremely expensive and inefficient for a home setup.

michaelmrose · on July 29, 2017

All things considered your house will probably never experience a major disaster remote storage has got to be many times more expensive.

tscs37 · on July 29, 2017

Remote storage is on B2, several Terabytes, it's not very expensive, restoring however is.

Ded7xSEoPKYNsDd · on July 27, 2017

Btrfs supports it. The parity stays exactly where it is, but new files will use the new device. You can (and far too often have to, when btrfs decides there is no free space left) also perform a 'balance' operation, which recreates all the b-trees on the disks, optionally with different parity options.

stavros · on July 27, 2017

Hmm, I don't understand how that's possible on RAID-Z? You can only have as much space per disk as the parity, no? I.e. you can't have three 500 GB disks for a total of 1 GB space, and replace one with a 1 GB disk and get more space, can you?

rocqua · on July 27, 2017

The idea is to add disks rather than replace. So to go from 3x500GB to 4x500GB.

stavros · on July 27, 2017

Oh, I see. In that case, it's odd to me that ZFS doesn't support that, hmm.

nske · on July 27, 2017

Well, it was designed for by a company targeting enterprise customers. I guess this feature would be secondary for that market.

takeda · on July 27, 2017

That doesn't make much sense, how can you change parity on the fly? I suspect it does the same thing as ZFS; it just adds another vdev

Ded7xSEoPKYNsDd · on July 27, 2017

For each chunk of on-disk data, the fs stores which devices it is stored on and the used parity configuration. You can take one such chunk of data, and clone it into a new chunk with identical contents but a different parity configuration in the free space of the devices that are part of the file system. (Just like you'd allocate new chunks for storing new files in the same parity configuration). Once that copy is created, all references to the old chunk are changed to point to the new chunk and so the old chunk is now free space. Repeat this process for all chunks in the file system, and the whole file system is converted to use a different parity configuration.

luca_ing · on July 27, 2017

> Can anyone who has been using ZFS long-term at home comment? How do you add more space?

My solution is to use RAID1 exclusively. That means that I can keep attaching pairs of devices if I run out of diskspace. I can never get them out again, however :-)

arwineap · on July 28, 2017

Whenever I run out of space, I hop on newegg, and search for drives twice my current size. I buy the first one I find immediately.

Once I get it, I hot swap a drive, and start searching for sales on more drives.

Rinse, repeat, all 5 disks swapped, more space appears \o/

bestham · on July 27, 2017

I have run ZFS at home since 2011 and migrated to larger drives one time: from 6x3TB to 6x4TB. The only downside is that the slabs are smaller compared to a native 6x4TB array.

tscs37 · on July 27, 2017

That's not the problem here.

The problem is having a 6x3TB pool and turning it into a 7x3TB pool, which is arguably much much cheaper than buying 6x4TB.

FractalNerve · on July 28, 2017

Can't you just use ZFS without RAIDZ and still have your data protected from corruption/drive failures? I think storage is hard and I never understood the advantage of RAID (at least for home usage). It really only looks like an inflexible option, with too much risk to me.

What's the benefit of RAIDZ over say, you choose to have X copies distributed over your disk(s)?

Answer: zfs: copies=n is not a substitute for device redundancy! source: http://jrs-s.net/2016/05/02/zfs-copies-equals-n/

Here's a discussion about it: https://www.reddit.com/r/DataHoarder/comments/4hbn8v/raidz_v...

Anyone who wants to have real security, relies on off-site backups, isn't that right? And aren't RAID(Z)s slow to recover also? (serious questions, I'm a zfs noob)

RAIDZ: I don't know what stripe-set configuration is good for me and don't want to waste time comparing RAID controllers or if I even need one. Then configuring the beast on the hardware and software side just seems to be too tedious. Why not just (de)/attach another disk and let zfs expand/shrink my total disk space without loosing consistency?

Startup idea: Someone clever should find a flexible storage solution that uses aufs, unionfs etc. to give you the flexibility we need.

Annatar · on July 27, 2017

On ZFS, a pool is resized by replacing all the drives in the pool. When autoexpand=on and the last drive is replaced, the pool will have expanded capacity. And drives are cheap.

When another RAID device is added to the pool instead, it is used as a stripe. Different types of RAID can be combined in this way, for example one can add a mirrored stripe to an already existing RAID-Z. Whether this is desired or not is a different discussion, since the point here is that both scenarios are possible.

Unlike other volume managers, ZFS expands on the disk boundary, rather than physical or logical elements; it's a larger boundary, but since no arcane or complex procedures or knowledge are required and drives are cheap, it's very practical, as well as elegant.

vpeters25 · on July 27, 2017

> The only issue that irks me is that I can't resize a pool after it was created, can't add new disks, can't remove one (without risking parity).

Not sure why ZFS would not auto resize for you. It's the reason I have been using it for several years as my home NAS under a linux server. In my case I just replace all the disk on a RAIDZ with bigger ones and automatically resizes up.

scottLobster · on July 27, 2017

Which forces you to waste $/GB and $$ in general on multiple higher capacity disks as opposed to just buying a single new drive.

Not that my 10 TB RAIDZ is running out of space any time soon, but as soon as Btrfs gets their shit together I'm switching. At the rate of a few blu rays a year it'll fill up sooner or later.

gigatexal · on July 27, 2017

Wow, just wow. These perf improvements are great. ASIC accelerated filesystems is just crazy

### Performance

* ARC Buffer Data (ABD) - Allocates ARC data buffers using scatter lists of pages instead of virtual memory. This approach minimizes fragmentation on the system allowing for a more efficient use of memory. The reduced demand for virtual memory also improves stability and performance on 32-bit architectures.

* Compressed ARC - Cached file data is compressed by default in memory and uncompressed on demand. This allows for an larger effective cache which improves overall performance.

* Vectorized RAIDZ - Hardware optimized RAIDZ which reduces CPU usage. Supported SIMD instructions: sse2, ssse3, avx2, avx512f, and avx512bw, neon, neonx2

* Vectorized checksums - Hardware optimized Fletcher-4 checksums which reduce CPU usage. Supported SIMD instructions: sse2, ssse3, avx2, avx512f, neon

* GZIP compression offloading - Hardware optimized GZIP compression offloading with QAT accelerator.

* Metadata performance - Overall improved metadata performance. Optimizations include a multi-threaded allocator, batched quota updates, improved prefetching, and streamlined call paths.

* Faster RAIDZ resilver - When resilvering RAIDZ intelligently skips sections of the device which don't need to be rebuilt.

g0xA52A2A · on July 27, 2017

> ASIC accelerated filesystems is just crazy

What?

anoother · on July 27, 2017

I guess this refers specifically to the following bullets:

* Vectorized RAIDZ - Hardware optimized RAIDZ which reduces CPU usage. Supported SIMD instructions: sse2, ssse3, avx2, avx512f, and avx512bw, neon, neonx2

* Vectorized checksums - Hardware optimized Fletcher-4 checksums which reduce CPU usage. Supported SIMD instructions: sse2, ssse3, avx2, avx512f, neon

* GZIP compression offloading - Hardware optimized GZIP compression offloading with QAT accelerator.

All use vector and/or SIMD instructions on the CPU, thoughh QuickAssist (QAT above) is available on dedicated add-in card also.

g0xA52A2A · on July 27, 2017

Yeah I don't equate SIMD instructions to being ASIC's.

lorenzhs · on July 27, 2017

Yeah an ASIC is an Application-specific integrated circuit. SIMD instructions are not application-specific.

gigatexal · on July 27, 2017

Thanks for correcting me. I was wrong. But it is nice to see acceleration in the FS space using SIMD intrinsics on the chips we already own.

gigatexal · on July 27, 2017

Yeah quick assist cards is what I was referencing.

dijit · on July 27, 2017

You should read his comment, he's talking about all the ASICs involved in the underlying filesystem.

lorenzhs · on July 27, 2017

let me rephrase gbrown_'s question: which part of my CPU is specific to ZFS? Because that's what ASIC means. The acronym stands for Application-specific integrated circuit.

gigatexal · on July 27, 2017

Edit: not asic just the SIMD stuff is neat. My bad.

chungy · on July 27, 2017

Big fan of ZFS and while there was an appearance of a lull while ZoL was stuck in 0.6.x for so long, this is a very welcome update :)

Warning to all that use ZFS to host the /boot file system, however: GRUB doesn't presently support all the features added to this release. Be careful about the features you enable!

simcop2387 · on July 27, 2017

This is why I still suggest keeping it on ext of some method and then just backing it up to zfs. Though now I'm wondering if you could put a raid1 on a zdev that matches the real disk and get benefits that way.....

XorNot · on July 27, 2017

Nice - zfs allow is now working. That's big news since it means users can snapshot.

Koshkin · on July 27, 2017

With both Linux and FreeBSD being open source, I am curious what is it that prevents ZFS from being a first-class, stable component in Linux right now, just as it already is in FreeBSD.

saghm · on July 27, 2017

ZFS's license (CDDL) is generally considered incompatible with Linux's (GPLv2), whereas it's compatible with the BSD license. See https://sfconservancy.org/blog/2016/feb/25/zfs-and-linux/, which also discusses the case of Ubuntu moving towards including it by default anyhow.

willlll · on July 27, 2017

Maybe it's fixed with this version, but we've seen kernel pauses on the order of minutes under high memory / OOM situations, with ZoL trying to get memory.

dpedu · on July 27, 2017

What was your setup like? I've encountered similar - the cause ended up being a mix of xattrs (I don't use them so I just disabled the feature) and needing to tune zfs to be less aggressive handling my consumer-grade disks.

nasduia · on July 27, 2017

Can you expand on what you did to tune things?

barrkel · on July 27, 2017

I run a 10 disk raidz2 array on a box with 4G RAM. If I'm running many other things on that box, memory gets very tight, and it swaps occasionally too. I've not seen kernel pauses on FS accesses though.

privong · on July 27, 2017

My understanding is that potential licensing incompatibilities is what has kept ZFS from being (tightly) integrated into linux:

https://en.wikipedia.org/wiki/ZFS#Linux

flyinghamster · on July 27, 2017

It's a crying shame that DKMS is so horribly broken on RHEL/CentOS/SL. Nothing like having it blithely make weak-updates symlinks to the ZFS and SPL modules that were compiled for your oldest still-installed kernel, only for those symlinks to end up pointing nowhere after that old kernel gets deleted. Oops! Your zpools now won't import until you clean up the mess. Been there, done that too many times.

ewwhite · on July 27, 2017

Yeah, I have to play the game of deleting the zfs and spl packages, deleting weak-modules and reinstalling on the new kernel. It's expected, but silly.

nwmcsween · on July 27, 2017

Has ZoL made a unified page cache yet? This is a major reason I won't use it.

sargun · on July 27, 2017

No. You are still reliant on the Arc which is kept on a slab outside of Linux's traditional page cache.

anorwell · on July 27, 2017

Encryption didn't make it in? That's too bad!

lathiat · on July 27, 2017

Yeah I've been watching the pull request eagerly but not quite there :(

https://github.com/zfsonlinux/zfs/pull/5769

Judging by the comments from today though it sounds like it will be merged very shortly after this release. So that's good news :D

I'm also wanting to look at this "OPAL" native drive encryption stuff, which requires among other things a UEFI plugin and some other bits. That would be a useful solution also though much more versatile here in ZFS for example doing per-user stuff, and also encrypted send without decrypting

Lots of good changes in 0.7 otherwise!

lmm · on July 27, 2017

Wait, has encryption hit open-source ZFS at all? Can I use it on FreeBSD?

wspeirs · on July 27, 2017

Don't know if you can use it on FreeBSD yet, but tcaputi added it to Linux here: https://github.com/zfsonlinux/zfs/pull/5769 Matt Ahrens approved the changes, and hopefully it'll get rolled in ZoL soon.

Full disclosure, I work with Tom at Datto.

wspeirs · on July 27, 2017

Yes, Brian's plan was to fold encryption into ZFS after 0.7.0. It was simply too large of a change to do in one shot.

nwrk · on July 27, 2017

Thank you and congrats on release - beer time

(running release candidate on laptop via Arch without issues and current stable on many servers)

unixhero · on July 27, 2017

Wow. What a massive release.

43224gg252 · on July 27, 2017

For as much propaganda as I see about ZFS, I've never once encountered it in the wild.

Most of the companies here I've never even heard of and the ones I have heard of aren't companies I would take into account when choosing a file system.

http://open-zfs.org/wiki/Companies

In all my years I've never encountered or needed this file system, and every time it's mentioned it sounds like it's more trouble to run it than its worth.

a012 · on July 27, 2017

Are you talking about ZFS on linux or just ZFS itself? It's pretty wild to say "it's more trouble to run it than its worth" just about ZFS when you run it on FreeBSD or Solaris.

walrus · on July 27, 2017

Maybe you just happened to work in industries that don't need it? I've had 5 software development jobs, two of which used ZFS. At one of them, it was an integral part of their product: at the time their product was launched, there was no other filesystem that did what they needed. I did not seek out jobs related to ZFS.

protomyth · on July 27, 2017

If I at a community college have adopted ZFS, with the low resources I have, I would hardly believe it’s more trouble than it’s worth. In fact, it has saved me time, money (no more expensive RAID card), and effort.

I have more trouble with FreeBSD than ZFS, but even with those troubles ZFS hasn’t let me down.

lmm · on July 27, 2017

I use it because it's a lot less trouble than what I did before. Separate md, lvm and filesystem means 3 different sets of commands; now I don't worry about partitions, mount points or anything like that, I just run one big zfs and throw all my disks in there. It stores all my data, and there's a single command interface which I only use for replacing disks (with zero data loss).

_joel · on July 27, 2017

Well, I can speak from experience that a well known UK broadcaster uses it heavily as part of a video storage network system (specifically ZoL, but ZFS also used with Solaris too)

thinkingkong · on July 27, 2017

Part of the challenge with ZFS is its amazing and (almost) nobody uses it. It creates a weird chicken-and-egg scenario with regards to familiarity in production. If Google can get away without using it then why do you need to have it? The reality is: you don't. It's another incredible technical achievement that was amazing for its time, but the world of containers, speed of systems, and abstraction of physical machines have made it more or less a moot point for everyone other than SAN administrators or hardcore COW optimization junkies.

linksnapzz · on July 27, 2017

" If Google can get away without using it then why do you need to have it?"

I think that comment has to go on the Best of 'Shit HN Says' corkboard ere long. I mean, when I am faced w/ a technical challenge at work, or at home...why would my first thought NOT be 'What would Google do in this situation?'

beagle3 · on July 27, 2017

Do you have the capital that Google has?

Do you have the expertise that Google has?

Do you have the scale that Google has?

Do you have the problems that Google have?

I had an employee once who, when tasked with designing an API for use by a partner company, copied MS conventions a-la "CreateWindow", when I asked about adding more functionality he said "I'll just do CreateWindowEx" and "IWebBrowser2". His claim was "If MS is doing that and MS is successful, this must be a good way to do it".

After I pointed out that, despite several attempts, there is no MS-Compatible API (this was in 2001, I think reactos and wine had both been started, there had been other attempts at WinAPI emulation, but none was worth anything), and that we had 2-people working on it, not 400, he agreed that perhaps the Unix API is easier to maintain, document and reimplement -- as is evident by many (re)implementations available.

For Microsoft, a labour-intensive-to-maintain-and-labour-intensive-to-document API is a benefit, because they can afford it and they are already holding the dominant market position (making it hard to be compatible with them). For a lean team, maintaing this kind of API is a penalty.

Similarly in this case, what's good for Google and what's good for you are not necessarily equivalent.

Google solves speed, redundancy and reliability by making and managing copies. ZFS makes one single machine the most reliable file system available today. But Google cares not about any single machine.

Do you have 100 copies of every important datum? If you don't, then you shouldn't copy Google.

supremesaboteur · on July 27, 2017

I assume the thinking goes like this :

ZFS was designed for enterprises. Enterprise software has requirements of high stability, scale etc. Google is probably on the extreme end of these problems. So if Google considered and rejected ZFS it probably means it was not very good.

What probably really happened was that Google was already too invested in Google File System and therefore did not give other file systems a fair chance

mjevans · on July 27, 2017

Google's big enough that it, most likely, has something that the public ceph project is a pale imitation of.

Actually, they're big enough that something between what ceph can do and what backblaze does is probably their /archive/ backend.

They probably have all of the fast stuff on 'disposable' temporary copies in RAM or SSDs.

petters · on July 27, 2017

GFS does not really solve the same problem as ZFS, I think.

georgyo · on July 27, 2017

ZFS works work containers and works well with docker.

ZFS is really really amazing, beyond the SAN. It's fantastic for both desktops and servers.

The lack of adaption is problematic, but I use it for all sorts of things.

Like mirroring snapshots between prod and qa, including the database and assets. I can test complicated software upgrades or database migrations quickly and repeatily. This is not something docker can't do by itself.

Much of the stuff I do with ZFS I can now do with btrfs, but ZFS is just so much nicer to work with.

williamstein · on July 27, 2017

Last week, I created about 500,000 separate zfs pools to back the Docker containers we are running on cocalc. I love ZFS.

symlinkk · on July 27, 2017

what do containers, speed, and abstraction have to do with maintaining the integrity of data? ZFS is the most popular filesystem for NAS's for a good reason...because it checksums data and can tell you (and correct from) failures.

lathiat · on July 27, 2017

You'd be surprised how many people do actually use it for dedicated storage applications. With the major backer for OmniOS some (including me) have been moving over to ZoL as a preference.

It's a real shame not to have it more integrated otherwise though in less dedicated applications. It's really hard to go back once you are used to zfs send/recv, transparent compression, etc.

stubish · on July 27, 2017

To spin up a full system container in a second or two requires a filesystem like ZFS. Its not a moot point, its a requirement even if you are not aware it is under the hood. The first thing you do when setting up LXD for your containers is feed it a ZFS pool.