Can One Navigation Brain Power All Robots? Inside CE-Nav’s Cross‑Embodiment Breakthrough

CE-Nav introduces a two‑stage imitation‑then‑reinforcement framework that decouples generic geometric planning from robot‑specific dynamics, enabling low‑cost, high‑performance navigation across quadrupeds, humanoids, and drones while requiring only brief online fine‑tuning.

Amap Tech
Amap Tech
Amap Tech
Can One Navigation Brain Power All Robots? Inside CE-Nav’s Cross‑Embodiment Breakthrough

Introduction

Developing a universal navigation strategy that can seamlessly transfer across robots of different forms—quadruped dogs, humanoid robots, and drones—has been a major challenge. CE-Nav, proposed by Amap, introduces an innovative “imitation‑then‑reinforcement” (IL‑then‑RL) training framework that decouples generic geometric path‑planning from robot‑specific dynamic adaptation, achieving low‑cost, high‑efficiency, high‑performance cross‑embodiment navigation.

Current Algorithm Dilemma

End‑to‑end strategies : Directly map sensor observations to joint commands, coupling high‑level planning with robot dynamics and resulting in brittle policies that do not transfer.

Hierarchical planning : High‑level planner relies on an oversimplified locomotion policy, leading to two core issues:

Data bias : Training data comes from specific robots, limiting generality.

Catastrophic averaging : Deterministic models produce meaningless averaged actions in multimodal scenarios (e.g., choosing to turn left or right at a T‑junction).

CE‑Nav Framework

The core idea is “divide and conquer”, splitting navigation into a high‑level velocity planner ( πhigh) and a low‑level motion controller ( πlow). The goal is to quickly learn a safe, efficient high‑level policy for any imperfect low‑level controller.

Stage One: Offline Imitation Learning – Training a “Universal Navigation Expert”

Data are generated in a pure 2D geometric environment using the classic DWA planner, producing 10 million expert trajectories where the robot is abstracted as a simple circle, ensuring the learned knowledge is pure geometric avoidance logic independent of robot morphology.

Expert data example
Expert data example

Core model (VelFlow) : A conditional normalizing‑flow model that learns the full distribution of reasonable actions instead of a single optimal action, avoiding the “catastrophic averaging” problem.

Training aligns VelFlow’s output velocity distribution with the 2D expert data conditioned on environment observations (e.g., lidar scans).

Stage Two: Online Reinforcement Learning – Training a “Dynamics‑Aware Refiner”

When deploying to a new robot, the frozen universal expert is reused, and a lightweight Refiner module is trained with a small amount of online interaction.

Guided learning : The expert provides a reference velocity V_ref that is fed together with the state to the Refiner’s actor‑critic network.

Principled deviation : The Refiner optimizes a combined loss L = λ·L_guide + L_ppo, where L_guide keeps the Refiner close to the expert and L_ppo is the standard PPO reinforcement loss. The weight λ follows a curriculum, high at the start for rapid imitation and gradually decaying to grant the Refiner autonomy.

Curriculum learning strategy
Curriculum learning strategy

Experimental Results

Evaluations in the Isaac Sim physics simulator covered three quadrupeds, one biped, and one quadrotor, demonstrating the framework’s ability to handle vastly different morphologies.

Ablation studies : Showed that expert guidance, the VelFlow multimodal expert, and the Refiner are all essential; removing any component drastically reduces performance or increases training time.

Cross‑embodiment generalization : The same pretrained expert achieved high navigation performance on all five robot types after only ~6 hours of second‑stage RL fine‑tuning.

Comparison with baselines : CE‑Nav outperformed DWA, Behavior Cloning, Diffusion Policy, and NavRL in both success rate (mSR) and path efficiency (mSPL), while requiring eight times less training time than NavRL.

Performance comparison with baselines
Performance comparison with baselines

Real‑world deployment : CE‑Nav was deployed on Unitree Go2 and MagicDog robots in indoor mazes, office corridors, and outdoor roads, consistently surpassing carefully tuned DWA and NavRL, confirming strong sim‑to‑real transfer and dynamic adaptability.

Real‑world deployment on Unitree Go2 and MagicDog
Real‑world deployment on Unitree Go2 and MagicDog

Conclusion

CE‑Nav’s “universal expert + dynamics‑aware optimizer” architecture successfully tackles the core difficulty of cross‑robot navigation, requiring no expensive real‑world data and enabling rapid adaptation to diverse robot forms. Its modular design also allows integration with higher‑level planners such as visual‑language models, paving the way for truly generalizable robotic navigation systems.

simulationreinforcement learningImitation Learningcross-embodimentrobot navigationVelFlow
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.