It's hard to say, but I think a useful measure would be to look at mode-dropping...

It's hard to say, but I think a useful measure would be to look at mode-dropping compared to the training data. Whatever the 'number of unique characters is', it clearly ought to be at least as large as the characters you see in the original training data, right?

For TADNE, Arfafax ran Danbooru2019 and a few million TADNE samples through CLIP to get the image embeddings, and clustered them; when the two sets of clusters were graphed using tsne, you could see that the TADNE StyleGAN2-ext did a lot of mode-dropping in that many smaller outlying clusters of characters/franchises/topics simply did not appear in TADNE samples. The TADNE looked like a big galaxy, while Danbooru2019 looked more like it was surrounded by archipelagos. TADNE was extensively trained on them and was a very large model, but the GAN dynamics & StyleGAN architecture mean it didn't do a good job absorbing rarer/more idiosyncratic Danbooru2019 image-clusters.

I expect newer generative models which avoid GAN losses and which use more flexible (but expensive!) architectures, like DALL-E, would perform much better in terms of mode-dropping, so you'd see a lot more unique characters/images out of them. (I'm very excited about them. As good as TADNE or Waifu Labs v2 may be, I think they are still far behind what could be done with just existing data/arch/compute.)