> As an example of a numerical value, GPT-2 achieves 1 bit per character (=token...

> As an example of a numerical value, GPT-2 achieves 1 bit per character (=token) on a Wikipedia data set

https://towardsdatascience.com/perplexity-of-language-models...

(I have not checked this claim, it's just what I found from googling "best large-language model character-perplexity 2023".)

A token in a LLM is generally more than one character, so I would guess that the entropy is a bit lower than that. Shannon estimated it at 0.6-1.3 bits/character in 1950 (https://mattmahoney.net/dc/entropy1.html)