DROPO: Sim-to-Real Transfer with Offline Domain Randomization

Abstract
In recent years, domain randomization has gained a lot of traction as a method for sim-to-real transfer of reinforcement learning policies; however, coming up with optimal randomization ranges can be difficult. In this paper, we introduce DROPO, a novel method for estimating domain randomization ranges for a safe sim-to-real transfer. Unlike prior work, DROPO only requires a precollected offline dataset of trajectories, and does not converge to point estimates. We demonstrate that DROPO is capable of recovering dynamic parameter distributions in simulation and finding a distribution capable of compensating for an unmodelled phenomenon. We also evaluate the method on two zero-shot sim-to-real transfer scenarios, showing a successful domain transfer and improved performance over prior methods.

Authored by Gabriele Tiboni, Karol Arndt, Ville Kyrki.

DROPO uses off-policy data (e.g. human demonstrations) to learn a domain randomization distribution,
which is later used to train a policy that can be directly transferred to the real world.

Highlights

#1 A novel method for estimating dynamics parameters distributions from offline collected real-world data is presented. #2 Robot policies trained with DROPO can efficiently transfer to the real world. #3 Human demonstrations can be used for safe and data-efficient simulator tuning. #4 Probabilistic metrics are crucial for optimizing dynamics parameters distributions.

Citing

@article{tiboni2023dropo,
  title = {DROPO: Sim-to-real transfer with offline domain randomization},
  journal = {Robotics and Autonomous Systems},
  pages = {104432},
  year = {2023},
  issn = {0921-8890},
  doi = {https://doi.org/10.1016/j.robot.2023.104432},
  url = {https://www.sciencedirect.com/science/article/pii/S0921889023000714},
  author = {Gabriele Tiboni and Karol Arndt and Ville Kyrki},
  keywords = {Robot learning, Transfer learning, Reinforcement learning, Domain randomization}
}