Gaode’s Fully Autonomous Embodied Robot Conquers Guide‑Blind Challenge at Yizhuang Marathon
Gaode’s four‑legged robot "Gaode Tutu" demonstrated fully autonomous navigation and manipulation in an open‑world marathon, tackling the guide‑blind task with a visually impaired teen and achieving state‑of‑the‑art results on multiple navigation and manipulation benchmarks using its ABot full‑stack system.
On a sunny morning in Yizhuang, more than 300 robots from dozens of teams raced in a half‑marathon that mixed city streets and GT‑track‑level obstacles. Among them, Gaode’s four‑legged robot, named Gaode Tutu , entered the arena with a visually impaired teenager, navigating without preset routes or remote control and proving its ability to perceive both immediate and global changes safely.
The guide‑blind scenario poses three core difficulties: long‑tail uncertainty in open environments, extremely high safety requirements, and incomplete spatial semantics where key locations lack systematic labeling. Traditional rule‑based or single‑modality approaches fail because they cannot generalize to the myriad unpredictable situations.
Gaode addresses these challenges with the ABot full‑stack architecture , which consists of a data layer (ABot‑World), a model layer (ABot‑N0 for navigation and ABot‑M0 for manipulation), and an agent layer (ABot‑Claw). This stack unifies perception, reasoning, and execution across tasks and robot morphologies.
ABot‑N0 provides a unified navigation base. It features a multimodal encoder that maps images, history, text commands, and coordinates into a shared semantic space, a cognitive brain with a task‑conditional dual‑head design that separates reasoning from action, and an action expert that generates smooth motion trajectories. Training leverages a massive dataset of 7 802 high‑fidelity 3D scenes, 16.9 M expert trajectories, and 5 M reasoning samples. On seven international embodied‑navigation benchmarks (CityWalker, SocNav, R2R‑CE, RxR‑CE, HM3D‑OVON, BridgeNav, EVT‑Bench), ABot‑N0 achieves SOTA performance; for example, on the SocNav closed‑loop benchmark it reaches an 88.3 % success rate, a >40‑point gain over the previous best, and compliance improves from ~30 % to >85 %.
ABot‑M0 tackles manipulation with a unified action representation built on the largest open‑source heterogeneous dataset UniACT (9 500 h, 600 k trajectories, >20 embodiments). It replaces diffusion‑based generation with action‑manifold learning (AML) using a DiT backbone to predict continuous, executable trajectories directly. Spatial perception is enhanced by dedicated modules (VGGT, Qwen‑Image‑Edit) that reason about object relationships and angles. ABot‑M0 attains SOTA on manipulation suites such as Libero, Libero‑Plus (80.5 % success, ~30 % improvement), and RoboCasa.
The agent layer, ABot‑Claw , integrates vision‑spatial dual memory (image semantics, geometric maps, object topology, and location anchors) and implements a closed‑loop reflection & self‑correction mechanism. This enables the robot to continuously evaluate its actions, adjust plans, and retry tasks (e.g., re‑searching for a bottle) without human intervention, mirroring human iterative problem‑solving.
By combining these layers, Gaode’s ABot stack turns the guide‑blind task from a laboratory demo into a real‑world solution, while also advancing capabilities for delivery, inspection, and service robots that must operate continuously in dynamic, open environments. The success of Gaode Tutu demonstrates that a unified, data‑driven, and self‑correcting embodied AI system can bridge the gap between isolated task breakthroughs and robust, general‑purpose robotic intelligence.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
