I am working on something like this, text-based with images. But I don't generate on the fly, I will have big lists (many thousands) of curated text outputs from GPT that get selected with randomization and nested if statements. For those interested, the game (work in progress) is Advinsula: https://advinsula.com/