Accelerating Visual-Policy Learning through Parallel Differentiable Simulation

Yale University

Abstract

In this work, we propose Decoupled Visual-based Analytical Policy Gradient (D.Va) — a computationally efficient algorithm for learning continuous control directly from raw pixel observations. Our approach decouple the rendering process from the computation graph, enabling seamless integration with existing differentiable simulation ecosystems without the need for specialized differentiable rendering software. This decoupling not only reduces computational and memory overhead but also effectively attenuates the policy gradient norm, leading to more stable and smoother optimization. We evaluate our method on standard visual control benchmarks using modern GPU-accelerated simulation. Experiments show that our approach significantly reduces wall-clock training time and consistently outperforms all baseline methods in terms of final returns. Notably, on complex tasks such as humanoid locomotion, our method achieves a 4× improvement in final return, and successfully learns a humanoid running policy within 4 hours on a single GPU.
Decoupled Policy Gradient

📸 The policy operates solely on raw visual input!

Below is the camera view observed by the neural agent.

Hopper

Half Cheetah

Humanoid

Ant

🚀 Accelerated Training Progress!

Leveraging differentiable simulation, our method learns to control from raw pixels in just tens of minutes for the majority of tasks!


Hopper

Iteration 0

Iteration 400 (4 minites of Training)

Iteration 17600 (2.5 hours of Training)


Cheetah

Iteration 0

Iteration 800 (8 minutes of Training)

Iteration 30000 (5 hours of Training)


Ant

Iteration 0

Iteration 2000 (20 minutes of Training)

Iteration 16000 (6 hours of Training)


Humanoid

Iteration 0

Iteration 9600 (4 hours of Training)

Iteration 36000 (15 hours of Training)


🏆 High Reward Achieved!

Our method not only improves wall-clock training time, but also consistently achieves higher final rewards — especially in complex tasks such as humanoid locomotion.

🏃 Humanoid Running Olympiad!


Training Curve Comparison

Comparison to RL Methods


Comparison to State-to-Viusal Distillation


Comparison to SHAC with Differentiable Rendering

🤖 More robots

🐕 Quadruped


BibTeX

@misc{you2025acceleratingvisualpolicylearningparallel,
      title={Accelerating Visual-Policy Learning through Parallel Differentiable Simulation}, 
      author={Haoxiang You and Yilang Liu and Ian Abraham},
      year={2025},
      eprint={2505.10646},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.10646}, 
}