Artificial Intelligence 6 min read

How Tencent’s Robot Dog Max Gains Human‑Like Decision‑Making with Pre‑trained AI and RL

Tencent Robotics X unveiled how its robot dog Max combines pre‑trained AI models with reinforcement learning across three learning stages, enabling it to acquire, store, and apply skills for autonomous decision‑making in complex tasks such as the World Chase Tag competition.

Tencent Tech
Tencent Tech
Tencent Tech
How Tencent’s Robot Dog Max Gains Human‑Like Decision‑Making with Pre‑trained AI and RL

Tencent Robotics X announced a breakthrough for its robot dog Max, integrating cutting‑edge pre‑trained AI models and reinforcement learning to dramatically improve flexibility and autonomous decision‑making.

Introducing Pre‑training and Reinforcement Learning

The new approach lets Max learn in phases, storing skills and knowledge for future complex tasks without relearning, effectively enabling “one‑to‑many” generalization.

Three Learning Stages

Stage 1: Using motion‑capture data from real dogs, researchers built an imitation‑learning task in simulation, compressing the data into a deep neural network to teach basic locomotion.

Stage 2: Additional network parameters link learned agile postures with environmental perception, allowing Max to adapt its movements to external cues.

Stage 3: A higher‑level network gathers task‑relevant information (e.g., opponent or flag data) and learns strategic policies for complex objectives, enabling continuous skill accumulation without retraining.

World Chase Tag Demo

The researchers evaluated Max in the “World Chase Tag” obstacle‑pursuit game, where two robots assume attacker and defender roles on a 4.5 m × 4.5 m arena with obstacles and a flag.

The attacker aims to capture the defender, while the defender tries to reach the flag without being caught. If the defender touches the flag first, roles swap instantly. Throughout the game, both robots maintain an average forward speed of 0.5 m/s.

Max demonstrates reasoning and decision‑making: it abandons a futile chase when it predicts it cannot catch the defender before the flag is reached, and it performs a sudden leap to seize the defender or accelerate toward the flag, mimicking animal predation behavior.

All control strategies are neural‑network policies learned in simulation and transferred zero‑shot to the real robot, allowing Max to recognize and handle unseen obstacles.

—END—

simulationaiRoboticsreinforcement learninggame AIpre‑trainingrobot dog
Tencent Tech
Written by

Tencent Tech

Tencent's official tech account. Delivering quality technical content to serve developers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.