ICLR 2025: Student-Informed Teacher Training 🤖

Exploring imitation learning with privileged teachers to enhance control behavior learning from high-dimensional inputs like images.

ICLR 2025: Student-Informed Teacher Training 🤖
UZH Robotics and Perception Group
1.1K views • Mar 26, 2025
ICLR 2025: Student-Informed Teacher Training 🤖

About this video

Imitation learning with a privileged teacher has proven effective for learning complex control behaviors from high-dimensional inputs, such as images. In this framework, a teacher is trained with privileged information, while a student tries to predict the actions of the teacher with limited observations, e.g., in a robot navigation task, the teacher might have access to the robot state and distances to all nearby obstacles, while the student only receives images of the scene. However, privileged imitation learning faces a key challenge: the student might be unable to imitate the teacher’s behavior due to the discrepancy between the different observations. This problem arises because the teacher is trained without considering if the student is capable of imitating the learned behavior. As a consequence of the information discrepancy (i.e., asymmetry), the teacher tends to over-rely on its full observability of the environment without considering the limited observation space of the student. This causes the teacher to provide target actions that the student cannot infer from its observations since the student lacks access to the same level of environmental information. Consider for example a robot navigating an obstacle-filled environment. In this case, an information asymmetry in the observation space could easily arise if the teacher policy receives the relative distances to "all of" the surrounding obstacles while the student, limited by its forward-facing camera, requires that obstacles be within the view of the camera. To address this teacher-student asymmetry, we propose a framework for joint training of the teacher and student policies, encouraging the teacher to learn behaviors that can be imitated by the student despite the student’s limited access to information and its partial observability. Based on the performance bound in imitation learning, we add (i) the approximated action difference between teacher and student as a penalty term to the reward function of the teacher, and (ii) a supervised teacher-student alignment step. We demonstrate our method on complex vision-based quadrotor flight and manipulation tasks.


Reference:
Nico Messikommer*, Jiaxu Xing*, Elie Aljalbout, Davide Scaramuzza,
"Student-Informed Teacher Training"
International Conference on Learning Representations (ICLR), 2025
PDF: https://rpg.ifi.uzh.ch/docs/ICLR25_Messikommer.pdf
Project Website: https://rpg.ifi.uzh.ch/sitt/
Code: https://github.com/uzh-rpg/sitt

For more info about our research on:
Agile Drone Flight: http://rpg.ifi.uzh.ch/aggressive_flight.html
Deep Learning: http://rpg.ifi.uzh.ch/research_learning.html

Affiliations:
N. Messikommer, J. Xing, E. Aljalbout, and D. Scaramuzza are with the Robotics and Perception Group, Dep. of Informatics, University of Zurich, and Dep. of Neuroinformatics, University of Zurich and ETH Zurich, Switzerland
http://rpg.ifi.uzh.ch/

Tags and Topics

Browse our collection to discover more content in these categories.

Video Information

Views

1.1K

Likes

41

Duration

4:37

Published

Mar 26, 2025

User Reviews

4.5
(1)
Rate:

Related Trending Topics

LIVE TRENDS

Related trending topics. Click any trend to explore more videos.