I'd appreciate an analysis of how it compresses! The encoding looks highly compr...

bruce511 · on June 12, 2023

At first glance I'd suggest it doesn't compress at all. Especially if the compression uses bytes.

Bit-streams of characters of non-bytes length become randomish when viewed as bytes, and random bytes contain no redundancy and can't be compressed.

Joker_vD · on June 12, 2023

> Especially if the compression uses bytes

Arithmetic compression can use whatever, even fractional bits, and it's been around since the seventies.

Dylan16807 · on June 12, 2023

Arithmetic coding goes one token/symbol at a time, just like most kinds of compression. The fractional bits come after token selection, and aren't really relevant here.

You can split the input into tokens that aren't a multiple of 8 bits, sure. But that's its own decision. 7 or 21 or whatever bit tokens could be fed into a huffman tree just as easily.

Timwi · on June 12, 2023

Yes, but if you run a normal compression algorithm like gz, rar or 7z on it, it's still going to use bytes.

temac · on June 12, 2023

Arithmetic compression uses whatever on the output. Of course you can retokenize weird input but you can usually do so for any algo if you can modify it. But UTF21 can not have a substantial advantage if you compress. It will usually be worse.