You can switch modes. (Yes that costs a dozen bits if you were otherwise able to stay in the same mode the entire time. Oh well, but I'd say it's worth it to avoid base45.)
And base45 is less efficient than looking at the efficiency of raw alphanumeric.
Alphanumeric is the most efficient QR code encoding mode.
(Just to further make this clear, for QR Byte encoding uses ISO/IEC 8859-1, where 65 characters are undefined, so 191/256, which is ~75%. If character encoding isn't an issue, than byte encoding is the most efficient, 256/256, 100%, but that's a very rare edge case. Also, last time I did the math on Kanji it was about 81% efficient. *I have not dug too deep into Kanji and there may be a way to make it more efficient than I'm aware of. I've never considered it useful for my applications so I have not looked.)
That is a semi-correct calculation of the wrong number. Base45 does not use all 45 characters in every slot. It goes 16 bits at a time, so the character storing the upper bits only has 2^16/45^2 = 33 possible values.
The most straightforward way to measure efficiency is to see that base45 takes 32 source bits, and encodes them into 33 bits. The way you're calculating, that's only 50%
But the better way to calculate efficiency is to take the log of everything (in other words, count how many bits are needed). Numeric is log(1000)/log(1024) which is 99.7%. Alphanum is 99.9%. Base45 is 97%.
And I don't know where that kanji number came from. It stores 13 bits at a time, mapping to 8192 shift-JIS code points, and the vast majority of them are valid. It's pretty efficient.
Huh? I don't necessarily care about an exact "base45", I care about QR code alphanumeric, which just so happens to be a (generic) base 45 character set. For QR code, two characters are encoded into 11 bits.
>in every slot.
I've worked with the QR code standards pretty seriously and I am unfamiliar with the term "slots" being used by the standards. This is why I suspect your referring specifically to RFC base45 (although the term isn't used there either), which QR code doesn't care about.
I also don't care about RFC Base 45 and would prefer to use a more bit space efficient method, such as using the iterative divide by radix method, which I also call "natural base conversion".
> base45 takes 32 source bits
For QR code alphanumeric, 6 characters use 33 bits, not 32.
way to calculate efficiency
The way we calculate this, for example, 2025/2048, we've termed "bit space efficiency". I'm not sure how commonly adopted this term is used in the rest of the industry. On the matter, I thought I had read "the iterative divide by radix algorithm" in industry, but after searching it turns out to be a term novel to our work.
This is also similar to the way Shannon originally calculated entropy and appears to be a fundamental representation of information. Of course log is useful, but it often results in partial bits or rounding, 5.5 in the case of alphanumeric, which is somewhat absurd considering that the bit is the quantum of information, again as shown by Shannon. There is no such thing as a partial bit that can be communicated, since information is fundamental to communication, so the fractional representation we've found to be more informative and easier to work with.
Granted, in all of this, when I have done the math (and I done a lot of math on this particular issue) there appeared to be some very extreme edge cases at the end result of the QR code where some arbitrary data encoded into QR numeric was slightly more efficient than alphanumeric, but overall alphanumeric was more efficient almost all the time. There are other considerations, like padding and escaping, that makes exact calculation more difficult than it's worth. I just needed to "most of the time" calculation and that's where I stopped.
For more detail of my work, my BASE45 predates the RFC by 2 years in 2019, then I published a base 45 alphabet, BASE45, by March 1, 2020, a whole year before the RFC. A patent including BASE45 was submitted June 22, 2021: https://image-ppubs.uspto.gov/dirsearch-public/print/downloa...
Matter of fact, because of the issues and confusion surrounding base conversion, I wrote this tool in 2019:
> Huh? I don't necessarily care about an exact "base45", I care about QR code alphanumeric
> I suspect your referring specifically to RFC base45
> For more detail of my work, my BASE45 predates the RFC by 2 years in 2019
The RFC was linked in the comment I originally replied to. The same comment where you saw the term "base45", because I didn't repeat it in my original reply.
> The way we calculate this, for example, 2025/2048, we've termed "bit space efficiency". I'm not sure how commonly adopted this term is used in the rest of the industry.
It's not a good metric when the size can vary.
3/4 uses 75% of the bit space, and 512/1024 uses 50% of the bit space. But if you give 20 bits to each, the first method can encode 59049 combinations and the second method can encode 262144 combinations.
> which is somewhat absurd considering that the bit is the quantum of information, again as shown by Shannon. There is no such thing as a partial bit that can be communicated, since information is fundamental to communication, so the fractional representation we've found to be more informative and easier to work with.
You can use any base and the math is roughly the same.
Distinguishing between two symbols is just the minimum. You can't transmit .3 bits but you can easily transmit 2.3 bits. If your receiver can distinguish between 5 symbols at full speed then 2.3 bits at a time is the most natural communication method.
> There are other considerations, like padding and escaping, that makes exact calculation more difficult than it's worth. I just needed to "most of the time" calculation and that's where I stopped.
Yeah, that's fine. They're both efficient. My deciding factor is not the tiny difference in efficiency, it's the ill-behaved symbols in alphanumeric.
Also, when I do the math alphanumeric is the most efficient QR mode, although just barely.