I think you don't understand COW or non-COW filesystems? They don't work the way...

tjoff · on Oct 12, 2022

I could say the same. Be more specific please. If everything I've said is wrong it should be easy to point out something demonstrably false.

lazide · on Oct 12, 2022

For one, you get no performance benefit over non-cow unless you update in place. It’s what every ‘fast and easy’ filesystem has to do - fat (including exfat), ext3, ext4, etc.

The failure modes are well documented - and got worse in many cases trying to work around performance issues due to journaling, but the journaling doesn’t resolve the issue fully because they can’t store all the data they need without making the performance issues worse. See https://en.m.wikipedia.org/wiki/Ext4 and ‘Delayed allocation and data loss’ for one example.

This isn’t a solved (or likely solvable in a reasonable way) problem with non-COW filesystems, which is one of the reasons why all newer filesystems are COW. The other being latency hits from tracking down COW delta blocks aren’t a big issue now due to SSDs and having enough RAM to have decent caches and pre-allocation buffers.

Also, COW doesn’t need to allocate (or re-read/re-checksum) the entire prior block when someone changes something, unlike modify in place. Due to alignment issues, doing SOME usually makes sense, but it’s highly configurable.

It only needs to add new metadata with updated mapping information for the updated range in the file, and then checksum/write out the newly updated data (only), plus or minus alignment issues or whatever. It acts like a patch. That’s literally the whole point of COW filesystems.

Update in place has an already allocated block it has to deal with in real time, either now consuming less space in it’s already allocated area (leaving tiny fragmentation) or by having to allocate a new block and toss the old one, which will have worse real time performance than a COW system, as it’s doing the new block allocation (which is more space than a COW write, unless the COW write is for the entire blocks contents!), plus going back and removing the old block.

ZFS record size for instance is just the maximum size of one of the patch ‘blocks’. The actual records are only the size of the actual write data + Filesystem overhead.

ZFS only then goes back and removes old records when they aren’t referenced by anyone, which is typically async/background, and doesn’t need to happen as part of the write itself.

This allows freeing up entire regions of pool space easier, and fragmentation becomes much less of an issue.

tjoff · on Oct 12, 2022

>For one, you get no performance benefit over non-cow unless you update in place. It’s what every ‘fast and easy’ filesystem has to do - fat (including exfat), ext3, ext4, etc.

That is just a matter of priorities then. And just because you might opt to not update in place in some situations doesn't mean that you can never do it.

I'm not sure what you mean by "Delayed allocation and data loss", I don't find it relevant to this discussion at all since that isn't about filesystem-corruption but application data corruption. And COW also suffers from this - unless you have NILFS/automatic continuous snapshots. Now with COW you probably have a much greater chance of recovering the data with forensic tools (also discussed in this thread regarding ZFS) but with huge downsides and hardly an relevant argument for COW in the vast majority of usecases anyway.

ZFS minimum block size corresponds to disk sector size so for most practical purposes it is the same as your typical non-COW filesystem there. Writing 1 byte requires you to read 4 kb, update it in memory, recalculate checksum, and then writing it down again.

How you remove old records shouldn't depend on COW should it?

My only statement was that checksums isn't in any way dependent on COW.

The discussion about compression is invalid as it is a common feature of non-COW filesystems anyway.

Haven't seen a proper argument for the corruption claims. And that you get corrupted data if you interrupt a write is not a huge deal. Mind you corrupted write. Not corrupted filesystem. The data was toast anyway. A typical COW would at best save you one "block" of data which is hardly worth celebrating anyway. Your application will not care if you wrote 557 out of 1000 blocks or 556 out of 1000 blocks your document is trashed anyway. You need to restore from backup (or from a previous snapshot, which of course is typical killer feature of COW)).

There are also several ways to solve the corruption issue. ReFS for instance has data checksums and metadata checksums but only do copy-on-write for the metadata. (edit: was wrong about this, it uses COW for data too if data checksumming is enabled)

dm-integrity can be used at a layer below the filesystem and solves it with the journal https://www.kernel.org/doc/html/latest/admin-guide/device-ma...

Yes, COW is popular and for good reasons. As is checksumming. It isn't surprising that modern filesystems employ both. Especially since the costs of both have been becoming less and less relevant at the same time.