A diffusion model cannot be a game engine because a game engine can be used to c...

kqr · on Aug 28, 2024

> even rules which are not visible on-screen.

If a rule was changed but it's never visible on the screen, did it really change?

> It simply generated frames and the appearance of play mechanics from a game it sampled (which humans created).

Simply?! I understand it's mechanically trivial but the fact that it's compressed such a rich conditional distribution seems far from simple to me.

znx_0 · on Aug 28, 2024

> If a rule was changed but it's never visible on the screen, did it really change?

Well for "some" games it does really change

darby_nine · on Aug 28, 2024

> Simply?! I understand it's mechanically trivial but the fact that it's compressed such a rich conditional distribution seems far from simple to me.

It's much simpler than actually creating a game....

stnmtn · on Aug 28, 2024

If someone told you 10 years ago that they were going to create something where you could play a whole new level of Doom, without them writing a single line of game logic/rendering code, would you say that that is simpler than creating a demo by writing the game themselves?

darby_nine · on Aug 28, 2024

There are two things at play here: the complexity of the underlying mechanism, and the complexity of detailed creation. This is obviously a complicated mechanism, but in another sense it's a trivial result compared to actually reproducing the game itself in its original intended state.

throwthrowuknow · on Aug 28, 2024

They only trained it on one game and only embedded the control inputs. You could train it on many games and embed a lot more information about each of them which could possibly allow you to specify a prompt that would describe the game and then play it.

calebh · on Aug 28, 2024

One thing I'd like to see is to take a game rendered with low poly assets (or segmented in some way) and use a diffusion model to add realistic or stylized art details. This would fix the consistency problem while still providing tangible benefits.

momojo · on Aug 28, 2024

The title should be "Diffusion Models can be used to render frames given user input"

sharpshadow · on Aug 28, 2024

So all it did is generate a video of the gameplay which is slightly different from the video it used for training?

TeMPOraL · on Aug 28, 2024

No, it implements a 3D FPS that's interactive, and renders each frame based on your input and a lot of memorized gameplay.

sharpshadow · on Aug 28, 2024

But is it playing the actual game or just making a interactive video of it?

TeMPOraL · on Aug 28, 2024

Yes.

All video games are, by definition, interactive videos.

What I imagine you're asking about is, a typical game like Doom is effectively a function:

  f(internal state, player input) -> (new frame, new internal state)

where internal state is the shape and looks of loaded map, positions and behaviors and stats of enemies, player, items, etc.

A typical AI that plays Doom, which is not what's happening here, is (at runtime):

  f(last frame) -> new player input

and is attached in a loop to the previous case in the obvious way.

What we have here, however, is a game you can play but implemented in a diffusion model, and it works like this:

  f(player input, N last frames) -> new frame

Of note here is the lack of game state - the state is implicit in the contents of the N previous frames, and is otherwise not represented or mutated explicitly. The diffusion model has seen so much Doom that it, in a way, internalized most of the state and its evolution, so it can look at what's going on and guess what's about to happen. Which is what it does: it renders the next frame by predicting it, based on current user input and last N frames. And then that frame becomes the input for the next prediction, and so on, and so on.

So yes, it's totally an interactive video and a game and a third thing - a probabilistic emulation of Doom on a generative ML model.

sharpshadow · on Aug 28, 2024

Thank you for the further explanation, that’s what I thought in the meantime and intended to find out with my question.

That opens up a new branch of possibilities.

Maxatar · on Aug 28, 2024

Making an interactive video of it. It is not playing the game, a human does that.

With that said, I wholly disagree that this is not an engine. This is absolutely a game engine and while this particular demo uses the engine to recreate DOOM, an existing game, you could certainly use this engine to produce new games in addition to extrapolating existing games in novel ways.

Workaccount2 · on Aug 28, 2024

What is the difference?