I don't know if there is research into this, didn't see it mentioned here, but this is the most probable path to something like AI consciousness and AGI. Of course it's highly speculative but video to world simulation is how the brain evolved and probably what is needed to have a robot behave like a living being. It would just do this in reverse, video input to inner world model, and use that for reasoning about the world. Extremely fascinating, and also scary this is happening so quickly.