(I have not checked this claim, it's just what I found from googling "best large-language model character-perplexity 2023".)
A token in a LLM is generally more than one character, so I would guess that the entropy is a bit lower than that. Shannon estimated it at 0.6-1.3 bits/character in 1950 (https://mattmahoney.net/dc/entropy1.html)
https://towardsdatascience.com/perplexity-of-language-models...
(I have not checked this claim, it's just what I found from googling "best large-language model character-perplexity 2023".)
A token in a LLM is generally more than one character, so I would guess that the entropy is a bit lower than that. Shannon estimated it at 0.6-1.3 bits/character in 1950 (https://mattmahoney.net/dc/entropy1.html)