TY - UNPB
T1 - Teacher-Student Reinforcement Learning for Mapless Navigation using a Planetary Space Rover
AU - Mortensen, Anton Bjørndahl
AU - Pedersen, Emil Tribler
AU - Vives Benedicto, Laia
AU - Burg, Lionel
AU - Madsen, Mads Rossen
AU - Bøgh, Simon
PY - 2023/9
Y1 - 2023/9
N2 - We address the challenge of enhancing navigation autonomy for planetary space rovers using reinforcement learning (RL). The ambition of future space missions necessitates advanced autonomous navigation capabilities for rovers to meet mission objectives. RL's potential in robotic autonomy is evident, but its reliance on simulations poses a challenge. Transferring policies to real-world scenarios often encounters the "reality gap", disrupting the transition from virtual to physical environments. The reality gap is exacerbated in the context of mapless navigation on Mars and Moon-like terrains, where unpredictable terrains and environmental factors play a significant role. Effective navigation requires a method attuned to these complexities and real-world data noise. We introduce a novel two-stage RL approach using offline noisy data. Our approach employs a teacher-student policy learning paradigm, inspired by the "learning by cheating" method. The teacher policy is trained in simulation. Subsequently, the student policy is trained on noisy data, aiming to mimic the teacher's behaviors while being more robust to real-world uncertainties. Our policies are transferred to a custom-designed rover for real-world testing. Comparative analyses between the teacher and student policies reveal that our approach offers improved behavioral performance, heightened noise resilience, and more effective sim-to-real transfer.
AB - We address the challenge of enhancing navigation autonomy for planetary space rovers using reinforcement learning (RL). The ambition of future space missions necessitates advanced autonomous navigation capabilities for rovers to meet mission objectives. RL's potential in robotic autonomy is evident, but its reliance on simulations poses a challenge. Transferring policies to real-world scenarios often encounters the "reality gap", disrupting the transition from virtual to physical environments. The reality gap is exacerbated in the context of mapless navigation on Mars and Moon-like terrains, where unpredictable terrains and environmental factors play a significant role. Effective navigation requires a method attuned to these complexities and real-world data noise. We introduce a novel two-stage RL approach using offline noisy data. Our approach employs a teacher-student policy learning paradigm, inspired by the "learning by cheating" method. The teacher policy is trained in simulation. Subsequently, the student policy is trained on noisy data, aiming to mimic the teacher's behaviors while being more robust to real-world uncertainties. Our policies are transferred to a custom-designed rover for real-world testing. Comparative analyses between the teacher and student policies reveal that our approach offers improved behavioral performance, heightened noise resilience, and more effective sim-to-real transfer.
KW - Reinforcement Learning
KW - Space Robotics
KW - Navigation
KW - planetary exploration
KW - Mars
KW - Moon
KW - space rover navigation
U2 - 10.48550/arXiv.2309.12807
DO - 10.48550/arXiv.2309.12807
M3 - Preprint
BT - Teacher-Student Reinforcement Learning for Mapless Navigation using a Planetary Space Rover
PB - arXiv
ER -