End-to-End RL Policy for Stair Locomotion | Laukik Bhalchandra Nakhwa

This project explores end-to-end reinforcement learning for stair locomotion using hexapod (Yuna) and quadruped robots. We train a parallel Proximal Policy Optimization (PPO) based DRL control policy to achieve robust stair climbing capabilities. I used isaac_gym implementation based on isaacsim for parallel simulation.

On the left, yuna - a hexapod from HEBI robotics. Middle, Locomotion policy trained on flat ground. Right, Locomotion policy being trained on stairs.

Trained RL hexapod policy

Key Observation : Hexapods (Yuna) being inherently stable, need more penalties than positive rewards for tasks such as climbing stairs as opposed to in quadrupeds.

Training policy walking on rough terrain, sloped terrain, stairs up/down, and discrete obstacles