Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What I understand is the folloeing: If this works so well, why didn't we have good video generation much earlier? After diffusion models were seen to work the most obvious thing to do was to generate the next frame based on previous framrs but... it took 1-2 years for good video models to appear. For example compare Sora generating minecraft video versus this method generating minecraft video. Say in both cases the player is standing on a meadow with fee inputs and watching some pigs. In the Sora video you'd expect the typical glitched to appear, like erratic, sliding movement, overlapping legs, multiplication of pigs etc. Would these glitches not appear in the GameNGen video? Why?


Because video is much more difficult than images (it's lots of images that have to be consistent across time, with motion following laws of physics etc), and this is much more limited in terms of scope than pure arbitrary video generation.


This misses the point, I'm comparing two methods of generating minecraft videos.


By simplifying the problem, we are better able to focus on researching specific aspects of generation. In this case, they synthetically created a large, highly domain-specific training set and then used this to train a diffusion model which encodes input parameters instead of text.

Sora was trained on a much more diverse dataset, and so has to learn more general solutions in order to maintain consistency, which is harder. The low resolution and simple, highly repetitive textures of doom definitely help as well.

In general, this is just an easier problem to approach because of the more focused constraints. It's also worth mentioning that noise was added during the process in order to make the model robust to small perturbations.


I would have thought it is much easier to generate huge amounts of game footage for training, but as I understand this is not what was done here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: