Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion

Tingxuan Leng1     Yushi Wang1     Tinglong Zheng2     Changsheng Luo1     Mingguo Zhao1

1 Tsinghua University      2 Beijing Jiaotong University

Abstract

Humanoid locomotion requires not only accurate command tracking for navigation but also compliant r esponses to external forces during human interaction. Despite significant progress, existing RL approaches mainly emphasize robustness, yielding policies that resist external forces but lack compliance-particularly challenging for inherently unstable humanoids. In this work, we address this by formulating humanoid locomotion as a multi-objective optimization problem that balances command tracking and external force compliance. We introduce a preference-conditioned multi-objective RL (MORL) framework that integrates rigid command following and compliant behaviors within a single omnidirectional locomotion policy. External forces are modeled via velocity-resistance factor for consistent reward design, and training leverages an encoder-decoder structure that infers task-relevant privileged features from deployable observations. We validate our approach in both simulation and real-world experiments on a humanoid robot. Experimental results indicate that our framework not only improves adaptability and convergence over standard pipelines, but also realizes deployable preference-conditioned humanoid locomotion.

BibTeX


@misc{leng2025preferenceconditionedmultiobjectiverlintegrated,
  title={Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion}, 
  author={Tingxuan Leng and Yushi Wang and Tinglong Zheng and Changsheng Luo and Mingguo Zhao},
  year={2025},
  eprint={2510.10851},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2510.10851}, 
}