I do wonder if we'll ever see this type of technology advance far enough that even a textual description (rather than hours of footage of an existing game) could generate something novel and playable. It seems absurd now, but throw enough compute and efficiency gains on it over a few decades, and it might approach the realm of possibility.