How Tencent’s Robot Dog Max Gains Human‑Like Decision‑Making with Pre‑trained AI and RL
Tencent Robotics X unveiled how its robot dog Max combines pre‑trained AI models with reinforcement learning across three learning stages, enabling it to acquire, store, and apply skills for autonomous decision‑making in complex tasks such as the World Chase Tag competition.
Tencent Robotics X announced a breakthrough for its robot dog Max, integrating cutting‑edge pre‑trained AI models and reinforcement learning to dramatically improve flexibility and autonomous decision‑making.
Introducing Pre‑training and Reinforcement Learning
The new approach lets Max learn in phases, storing skills and knowledge for future complex tasks without relearning, effectively enabling “one‑to‑many” generalization.
Three Learning Stages
Stage 1: Using motion‑capture data from real dogs, researchers built an imitation‑learning task in simulation, compressing the data into a deep neural network to teach basic locomotion.
Stage 2: Additional network parameters link learned agile postures with environmental perception, allowing Max to adapt its movements to external cues.
Stage 3: A higher‑level network gathers task‑relevant information (e.g., opponent or flag data) and learns strategic policies for complex objectives, enabling continuous skill accumulation without retraining.
World Chase Tag Demo
The researchers evaluated Max in the “World Chase Tag” obstacle‑pursuit game, where two robots assume attacker and defender roles on a 4.5 m × 4.5 m arena with obstacles and a flag.
The attacker aims to capture the defender, while the defender tries to reach the flag without being caught. If the defender touches the flag first, roles swap instantly. Throughout the game, both robots maintain an average forward speed of 0.5 m/s.
Max demonstrates reasoning and decision‑making: it abandons a futile chase when it predicts it cannot catch the defender before the flag is reached, and it performs a sudden leap to seize the defender or accelerate toward the flag, mimicking animal predation behavior.
All control strategies are neural‑network policies learned in simulation and transferred zero‑shot to the real robot, allowing Max to recognize and handle unseen obstacles.
—END—
Tencent Tech
Tencent's official tech account. Delivering quality technical content to serve developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.