DouZero: A Simple Monte‑Carlo Based AI Achieving Human‑Level Performance in Dou Dizhu
The paper presents DouZero, a reinforcement‑learning AI for the Chinese card game Dou Dizhu that combines a Monte‑Carlo method with a value network, uses binary matrix encodings for states and actions, and achieves human‑level play and state‑of‑the‑art results on modest GPU hardware.
Recent advances have shown AI surpassing humans in games like Go, yet the imperfect‑information, high‑variance card game Dou Dizhu remained a challenge. Researchers from Kuaishou AI Platform introduced DouZero, a system that reaches human‑level performance using a surprisingly simple approach that runs on a standard four‑GPU server.
Dou Dizhu poses unique difficulties: a massive state space, over 27,000 distinct card‑type combinations, non‑perfect information, and a mix of cooperation (between the two farmers) and competition (against the landlord). These factors make reinforcement‑learning especially hard.
DouZero’s core algorithm fuses a Monte‑Carlo method with a deep value network. Both states and actions are encoded as 15×4 binary matrices, capturing the presence of each card rank. A value network takes a state‑action pair and predicts the expected win probability. Training proceeds via self‑play: generate a game, record all (state, action, reward) tuples, and update the network with gradient descent. Parallelism is achieved with 45 actors on a single GPU server, feeding experiences to a central trainer.
Extensive experiments compare DouZero against several strong baselines—DeltaDou (Monte‑Carlo Tree Search + Bayesian inference), CQN (card‑type decomposition + DQN), supervised‑learning models, and numerous rule‑based bots. Using Winning Percentage (WP) and Average Difference in Points (ADP) as metrics, DouZero consistently outperforms all competitors, achieving the highest scores on the Botzone leaderboard and surpassing human champions within a few days of training.
Training efficiency is highlighted: on a server equipped with four 1080Ti GPUs, DouZero exceeds the supervised‑learning baseline in two days and overtakes DeltaDou after ten days, demonstrating rapid learning despite the large action space.
When evaluated on human gameplay data, DouZero initially improves its prediction accuracy, indicating it learns human‑like strategies, but later surpasses human patterns, suggesting the emergence of super‑human tactics. A case study shows the AI learning cooperative moves between the two farmers, confirming it captures the game's collaborative aspect.
The authors release the Dou Dizhu simulation environment, training code, and an online demo, encouraging further research on Monte‑Carlo methods in reinforcement learning, especially for tasks with sparse rewards and huge action spaces.
Reference: Jiang et al., "DeltaDou: Expert-level Doudizhu AI through Self‑play", IJCAI 2019.
Reference: Sutton & Barto, "Reinforcement Learning: An Introduction", MIT Press 2018.
Reference: Zha et al., "RLCard: A Platform for Reinforcement Learning in Card Games", IJCAI 2020.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.