Humanoid locomotion requires not only accurate command tracking for navigation but also compliant r
esponses to external forces during human interaction. Despite significant progress, existing RL approaches
mainly emphasize robustness, yielding policies that resist external forces but lack compliance-particularly
challenging for inherently unstable humanoids. In this work, we address this by formulating humanoid
locomotion as a multi-objective optimization problem that balances command tracking and external force
compliance. We introduce a preference-conditioned multi-objective RL (MORL) framework that integrates
rigid command following and compliant behaviors within a single omnidirectional locomotion policy. External
forces are modeled via velocity-resistance factor for consistent reward design, and training leverages
an encoder-decoder structure that infers task-relevant privileged features from deployable observations.
We validate our approach in both simulation and real-world experiments on a humanoid robot. Experimental
results indicate that our framework not only improves adaptability and convergence over standard pipelines,
but also realizes deployable preference-conditioned humanoid locomotion.