This seems more a critique on the particular model (as in the resulting diffusion model generated), not on diffusion models in general. It's also a bit misstated - this doesn't require a working car on the road to do its job (present tense), it required one to train it to do its job (past tense) and it's not particularly clear why a game engine using concepts gained from how another worked should cease to be a game engine. For diffusion models in general and not the specifically trained example here I don't see why one would assume the approach can't also work outside of the particular "test tracks" it was trained on, just as a typical diffusion model works on more than generating the exact images it was trained on (can interpolate and apply individual concepts to create a novel output).
my point is something else: a game engine is something which can be separated from a game and put to use somewhere else. this is basically the definition of „engine“. the above is not an engine but a game without any engine at all therefor should not be called „engine“.