Very cool! I've done research on reinforcement/imitation learning in world models. A great intro to these ideas is here: https://worldmodels.github.io/
I'm most excited for when these methods will make a meaningful difference in robotics. RL is still not quite there for long-horizon, sparse reward tasks in non-zero-sum environments, even with a perfect simulator; e.g. an assistant which books travel for you. Pay attention to when virtual agents start to really work well as a leading signal for this. Virtual agents are strictly easier than physical ones.
Compounding on that, mismatches between the simulated dynamics and real dynamics make the problem harder (sim2real problem). Although with domain randomization and online corrections (control loop, search) this is less of an issue these days.
Multi-scale effects are also tricky: the characteristic temporal length scale for many actions in robotics can be quite different from the temporal scale of the task (e.g. manipulating ingredients to cook a meal). Locomotion was solved first because it's periodic imo.
Check out PufferAI if you're scale-pilled for RL: just do RL bigger, better, get the basics right. Check out Physical Intelligence for the same in robotics, with a more imitation/offline RL feel.
I'm most excited for when these methods will make a meaningful difference in robotics. RL is still not quite there for long-horizon, sparse reward tasks in non-zero-sum environments, even with a perfect simulator; e.g. an assistant which books travel for you. Pay attention to when virtual agents start to really work well as a leading signal for this. Virtual agents are strictly easier than physical ones.
Compounding on that, mismatches between the simulated dynamics and real dynamics make the problem harder (sim2real problem). Although with domain randomization and online corrections (control loop, search) this is less of an issue these days.
Multi-scale effects are also tricky: the characteristic temporal length scale for many actions in robotics can be quite different from the temporal scale of the task (e.g. manipulating ingredients to cook a meal). Locomotion was solved first because it's periodic imo.
Check out PufferAI if you're scale-pilled for RL: just do RL bigger, better, get the basics right. Check out Physical Intelligence for the same in robotics, with a more imitation/offline RL feel.