More

ticki_ · on May 21, 2017

Author here.

> I don't know if the authors are here, but if they are - would you comment on fragmentation and the dangers of growing a filesystem past 95-98% full ?

Fragmentation isn't an issue in TFS, at all. Because it is a cluster-based file system. Essentially that means that files aren't stored contagiously, but instead in small chunks. The allocation is done entirely on the basis of unrolled freelists.

This does cause a slight space overhead (only slight, coming from the fact that metadata of the file is stored in the full form), but it completely eliminates any fragmentation.

gpm · on May 21, 2017

I only have a basic understanding of harddisks/filesystems, but won't that slow down reading/writing on harddisks since the chunks won't be in order and close together?

ticki_ · on May 22, 2017

With modern hard disks, no. They work in sectors.

amadvance · on May 22, 2017

I suppose you mean SSD disks.

Rotating disks are surely affected by not contiguous operations, even if done with sector granularity.

asymmetric · on May 21, 2017

> files aren't stored contagiously

I think you mean contiguously.

ticki_ · on May 21, 2017

> A good design would look at the state of the art and use the best techniques available. If the aim was research, then try one new thing, not a thousand.

That's what it does: It takes from many sources (although mainly ZFS).

ticki_ · on May 21, 2017

TFS was created to speed up the development. The issue is that following the design specs makes it much slower to implement, and prevents a "natural" development (like, you cannot implement it like a tower, you need every component before completion). It was started[1] and got far enough to reading images, but implementing it took ages, so we decided to put it off for now.

It is very similar to ZFS.

[1] https://github.com/ticki/zfs

XorNot · on May 21, 2017

This doesn't quite seem to follow? ZFS's pool model has supported flagged off features for a very long time - isn't the issue more that to do the things ZFS does you need to implement all the other components? And since you're planning to do a lot of what ZFS does...

ticki_ · on May 21, 2017

> I understand this filesystem is still nascent, but shouldn't data integrity at least be one of the design goals?

What makes you think it isn't? It definitely is. In fact, it borrows several ideas from ZFS wrt/ integrity.

For example, it uses parent block checksums like ZFS.

dap · on May 21, 2017

> What makes you think it isn't?

The first section in the README is called "Design goals", with 13 items. None of them is "data integrity", and none of them even talks about validating the data or handling any failures aside from power loss.

By contrast, in the canonical slide deck on ZFS[1], the first slide talks about "provable end-to-end data integrity". In the paper[2], "design principles" section 2.6 is "error detection and correction".

I'm glad to hear that's also a focus for TFS. With ZFS, the emphasis on data integrity resulted in significant architectural choices -- I'm not sure it's something that can just be bolted on later. As a reader, I wouldn't have assumed TFS had the same emphasis. I think it's pretty valuable to spell this out early and clearly, with details, because it's actually quite a differentiator compared with most other systems.

[1] https://wiki.illumos.org/download/attachments/1146951/zfs_la...

[2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.184...

ticki_ · on May 22, 2017

> The first section in the README is called "Design goals", with 13 items. None of them is "data integrity", and none of them even talks about validating the data or handling any failures aside from power loss.

Fair enough.

cmurf · on May 21, 2017

I can't tell from the code whether this is metadata only checksumming or if data is also checksummed. ZFS and Btrfs checksum both metadata and data.

ticki_ · on May 21, 2017

The data is summed too.

ticki_ · on May 21, 2017

ChaCha2 isn't a block cipher. It's a stream cipher, and would thus require storage of IVs ... which, well, is going to expensive space-wise.

zx2c4 · on May 21, 2017

Firstly, it's ChaCha20. I don't think anybody in their right mind would advocate a 2 round ChaCha. Secondly, there _are_ steam cipher constructions to achieve the design requirements for something like TFS.

ticki_ · on May 21, 2017

I meant ChaCha20 ofc.

Well, my points still remains. You need to store IVs/keys/etc. which makes it pretty unsuitable for a file system.

dchest · on May 21, 2017

On the contrary, it makes it suitable for file systems! File systems are not block devices, they are data structures on top of block devices — it's the job of these data structures to keep stuff, such as data, inodes, and... keys, and IVs, and MACs, checksums, etc.

TFS uses Speck in XEX mode (edit: I just realized you're the author, so you already know that :). tptacek wrote a nice post about it: https://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/

and I quote:

"If you’re encrypting a filesystem and not disk blocks, still don’t use XTS! Filesystems have format-awareness and flexibility. Filesystems can do a much better job of encrypting a disk than simulated hardware encryption can."

Edit: check out this presentation on how encryption was bolted on ZFS: https://www.youtube.com/watch?v=frnLiXclAMo (slides: https://drive.google.com/file/d/0B5hUzsxe4cdmU3ZTRXNxa2JIaDQ...) It's not perfect, but it provides data authentication by reusing checksum fields for storing MACs.

Edit 2: also check out bcachefs encryption design doc: http://bcachefs.org/Encryption/ (also not perfect, but uses proper AEAD — ChaCha20-Poly1305. I sent some questions and suggestions to the author, but received no reply :/)

koverstreet · on May 21, 2017

You got to store checksums separate with the data anyways, if you're already doing that storing nonces isn't that big of a deal. bcachefs does it.

ticki_ · on Dec 9, 2016

Please don't. DJB2 is a poor hash function. It's similar to FNV: Entropy only moves upwards, so flipping higher bits doesn't affect lower bits.

In other words, you risk mapping `n` and `-n` to the same value under some modulus.

ticki_ · on Dec 9, 2016

A lot of stuff.

SHA256 is very slow, and that's no surprise. It's cryptographic after all.

Here's a small list of usecases for non-cryptographic hash functions:

- Checksums and error correction codes, as long as there is no way to maliciously use this.

- Hash tables. These always use non-cryptographic hash functions.

- Bloom filters.

- Heuristic fingerprinting. They're not strong enough to be used for normal data fingerprints, but they can be used as a way to decide if two buffers are "probably equal" or "certainly not equal".

Hash tables are the main one. Cryptographic hash functions are almost never used in them. SipHash is a popular choice, but it is not cryptographic. That is a misunderstanding: It's a MAC function.

xorxornop · on Dec 10, 2016

I may be totally off-base here, but can't the SHA256 CPU instructions be used for this? Then it'd just be 1 (? Or is it more) instruction to perform?

ticki_ · on Dec 9, 2016

In hash tables, you never use cryptographic hash functions. Why? Because they're slower.

Take SHA3, which is around 50x slower than SeaHash. That is really really bad for hash tables.

When hash collisions happen in hash tables, they're resolved through collision-resolution strategy, such a linear proping.

Fingerprints are one very narrow usecase for hash functions, and there are tousands of other uses.

ticki_ · on Dec 9, 2016

There's a huge difference between cryptographic and non-cryptographic. Note that blake2 has various length, whereas SeaHash is fixed to 64-bit (although I suppose it's not to hard to make a version with bigger length), and thus naturally collisions will happen.

If you need fingerprints, don't use SeaHash, but if you are looking to insert into e.g. a hash table, you shouldn't use BLAKE2. It's awfully slow for that.

ticki_ · on Dec 9, 2016

MetroHash's main transformation actually loses entropy, so that's, well, pretty bad. It still passes Smhasher, though.