VByte is a weird baseline to compare to: it is a byte-aligned encoding scheme, s...

keepamovin · on Aug 28, 2024

Yeah I think the paper could be better, too. Thank you for your suggestions — and for the information, it’s very interesting!

Although, the padding requirement for integers with bits ending in 10 can actually be dismissed: you join on only 101, then to decode you just split on 10101 first, re-attach the removed 10 to the left, then split the resulting parts on 101, removing the padding.

I guess you can consider that a spec bug in the draft? Hahahaha ! :)

I don’t know what complete means, and I don’t know if this becomes complete, but anyway it’s really interesting.

Sounds like it would be a good idea to add these codings to the bench mark!

There’s another potential criticism I was thinking about: what if we encode the lengths with VByte then concat the bits, just like we do with irradix to make the L1 variant? It’s not really fair to compare L1 with VByte when they’re operating in different modes like that. It’s possible that any advantage with our current scheme disappears if you do that, I don’t know.

I picked VByte tho because it’s so simple and very common, so just a question for you: why aren’t the codings you mentioned used as frequently as VByte? Are they more complex or slower?