Hacker News new | past | comments | ask | show | jobs | submit login

> Base45 does not use all 45 characters

Huh? I don't necessarily care about an exact "base45", I care about QR code alphanumeric, which just so happens to be a (generic) base 45 character set. For QR code, two characters are encoded into 11 bits.

>in every slot.

I've worked with the QR code standards pretty seriously and I am unfamiliar with the term "slots" being used by the standards. This is why I suspect your referring specifically to RFC base45 (although the term isn't used there either), which QR code doesn't care about. I also don't care about RFC Base 45 and would prefer to use a more bit space efficient method, such as using the iterative divide by radix method, which I also call "natural base conversion".

> base45 takes 32 source bits For QR code alphanumeric, 6 characters use 33 bits, not 32. way to calculate efficiency

The way we calculate this, for example, 2025/2048, we've termed "bit space efficiency". I'm not sure how commonly adopted this term is used in the rest of the industry. On the matter, I thought I had read "the iterative divide by radix algorithm" in industry, but after searching it turns out to be a term novel to our work.

This is also similar to the way Shannon originally calculated entropy and appears to be a fundamental representation of information. Of course log is useful, but it often results in partial bits or rounding, 5.5 in the case of alphanumeric, which is somewhat absurd considering that the bit is the quantum of information, again as shown by Shannon. There is no such thing as a partial bit that can be communicated, since information is fundamental to communication, so the fractional representation we've found to be more informative and easier to work with.

Granted, in all of this, when I have done the math (and I done a lot of math on this particular issue) there appeared to be some very extreme edge cases at the end result of the QR code where some arbitrary data encoded into QR numeric was slightly more efficient than alphanumeric, but overall alphanumeric was more efficient almost all the time. There are other considerations, like padding and escaping, that makes exact calculation more difficult than it's worth. I just needed to "most of the time" calculation and that's where I stopped.

For more detail of my work, my BASE45 predates the RFC by 2 years in 2019, then I published a base 45 alphabet, BASE45, by March 1, 2020, a whole year before the RFC. A patent including BASE45 was submitted June 22, 2021: https://image-ppubs.uspto.gov/dirsearch-public/print/downloa...

Matter of fact, because of the issues and confusion surrounding base conversion, I wrote this tool in 2019:

https://convert.zamicol.com

It is the first arbitrary base conversion tool on the web. It also was essential for our work with QR code and other base conversion issues.




> Huh? I don't necessarily care about an exact "base45", I care about QR code alphanumeric

> I suspect your referring specifically to RFC base45

> For more detail of my work, my BASE45 predates the RFC by 2 years in 2019

The RFC was linked in the comment I originally replied to. The same comment where you saw the term "base45", because I didn't repeat it in my original reply.

> The way we calculate this, for example, 2025/2048, we've termed "bit space efficiency". I'm not sure how commonly adopted this term is used in the rest of the industry.

It's not a good metric when the size can vary.

3/4 uses 75% of the bit space, and 512/1024 uses 50% of the bit space. But if you give 20 bits to each, the first method can encode 59049 combinations and the second method can encode 262144 combinations.

> which is somewhat absurd considering that the bit is the quantum of information, again as shown by Shannon. There is no such thing as a partial bit that can be communicated, since information is fundamental to communication, so the fractional representation we've found to be more informative and easier to work with.

You can use any base and the math is roughly the same.

Distinguishing between two symbols is just the minimum. You can't transmit .3 bits but you can easily transmit 2.3 bits. If your receiver can distinguish between 5 symbols at full speed then 2.3 bits at a time is the most natural communication method.

> There are other considerations, like padding and escaping, that makes exact calculation more difficult than it's worth. I just needed to "most of the time" calculation and that's where I stopped.

Yeah, that's fine. They're both efficient. My deciding factor is not the tiny difference in efficiency, it's the ill-behaved symbols in alphanumeric.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: