If it can be trained on (many) existing games, then it might work similarly to how you don't need to describe every possible detail of a generated image in order to get something that looks like what you're asking for (and looks like a plausible image for the underspecified parts).
Also: define "fun" and "new" in a "simple text prompt". Current image generators suck at properly reflecting what you want exactly, because they regurgitate existing things and styles.
Sit down and write down a text prompt for a "fun new game". You can start with something relatively simple like a Mario-like platformer.
By page 300, when you're about halfway through describing what you mean, you might understand why this is wishful thinking