Below is the camera view observed by the neural agent.
Hopper
Half Cheetah
Humanoid
Ant
Leveraging differentiable simulation, our method learns to control from raw pixels in just tens of minutes for the majority of tasks!
Iteration 0
Iteration 400 (4 minites of Training)
Iteration 17600 (2.5 hours of Training)
Iteration 0
Iteration 800 (8 minutes of Training)
Iteration 30000 (5 hours of Training)
Iteration 0
Iteration 2000 (20 minutes of Training)
Iteration 16000 (6 hours of Training)
Iteration 0
Iteration 9600 (4 hours of Training)
Iteration 36000 (15 hours of Training)
Our method not only improves wall-clock training time, but also consistently achieves higher final rewards — especially in complex tasks such as humanoid locomotion.
@misc{you2025acceleratingvisualpolicylearningparallel,
title={Accelerating Visual-Policy Learning through Parallel Differentiable Simulation},
author={Haoxiang You and Yilang Liu and Ian Abraham},
year={2025},
eprint={2505.10646},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.10646},
}