Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In such cases, the serialized binary are mostly in 200~1000 bytes. Not big enough for zstd to work

You're not referring to the same dictionary that I am. Look at --train in [1].

If you have a training corpus of representative data, you can generate a dictionary that you preshare on both sides which will perform much better for very small binaries (including 200-1k bytes). It's the same kind of technique but zstd's mechanism is absolutely general whereas purpose-built non-entropy based dictionaries are more limited & less likely to outperform.

If you want maximum flexibility (i.e. you don't know the universe of representative messages ahead of time or you want maximum compression performance), you can gather this corpus transparently as messages are generated & then generate a dictionary & attach it as sideband metadata to a message. You'll probably need to defer the decoding if it references a dictionary not yet received (i.e. send delivers messages out-of-order from generation). There are other techniques you can apply, but the general rule is that your custom encoding scheme is unlikely to outperform zstd + a representative training corpus. If it does, you'd need to actually show this rather than try to argue from first principles.

[1] https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md



Sadly, we don't have such a training corpus of representative data. Fury is just a serialization framework, we can't assume any string distribution. I thought about scan the code of apache ofbiz, and use the domain objects in this project as the default corpus to carry a static huffman/zstd. But ofbiz may not be representative still. For your second suggestion, I can't agree more. Actually Fury has already implemented this, we call it meta share mode. Fury will write such meta only once on a channel. And resend such meta if the channel got disconnected. But this is not easy to use and impose overhead to users. So we proposed meta encoding here. Anyway, yours suggestions are very insightful. Thanks very much


Wow. Didn't know about Train and custom dictionaries. Very cool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: