Abstract
In recent years, domain randomization has gained a lot of traction as a method for sim-to-real transfer of reinforcement learning policies; however, coming up with optimal randomization ranges can be difficult.
In this paper, we introduce DROPO, a novel method for estimating domain randomization ranges for a safe sim-to-real transfer.
Unlike prior work, DROPO only requires a precollected offline dataset of trajectories, and does not converge to point estimates.
We demonstrate that DROPO is capable of recovering dynamic parameter distributions in simulation and finding a distribution capable of compensating for an unmodelled phenomenon.
We also evaluate the method on two zero-shot sim-to-real transfer scenarios, showing a successful domain transfer and improved performance over prior methods.
Authored by Gabriele Tiboni, Karol Arndt, Ville Kyrki.
DROPO uses off-policy data (e.g. human demonstrations) to learn a domain randomization distribution,which is later used to train a policy that can be directly transferred to the real world.
Highlights
Citing
@article{tiboni2023dropo,
title = {DROPO: Sim-to-real transfer with offline domain randomization},
journal = {Robotics and Autonomous Systems},
pages = {104432},
year = {2023},
issn = {0921-8890},
doi = {https://doi.org/10.1016/j.robot.2023.104432},
url = {https://www.sciencedirect.com/science/article/pii/S0921889023000714},
author = {Gabriele Tiboni and Karol Arndt and Ville Kyrki},
keywords = {Robot learning, Transfer learning, Reinforcement learning, Domain randomization}
}
