Artificial Intelligence 15 min read

DouZero: A Simple Monte‑Carlo Based AI that Achieves State‑of‑the‑Art Performance in Dou Dizhu

DouZero, a reinforcement‑learning AI for the Chinese card game Dou Dizhu, combines a Monte‑Carlo value‑network with compact action encoding, trains on a four‑GPU server, and outperforms existing AI baselines, ranks first on Botzone, and even surpasses human play in several metrics.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
DouZero: A Simple Monte‑Carlo Based AI that Achieves State‑of‑the‑Art Performance in Dou Dizhu

Artificial intelligence has achieved remarkable success in many board and card games, yet Dou Dizhu remains a challenging domain due to its huge state space, hidden information, and the coexistence of cooperation and competition between two farmer players.

Researchers from Kwai (Kuaishou) AI Platform and Texas A&M’s DATA Lab introduced DouZero , the first AI for Dou Dizhu built from scratch using a surprisingly simple algorithm: a Monte‑Carlo method integrated with a deep value network. The system encodes every possible hand and action as a 15×4 binary matrix, allowing efficient representation of the 27,472 distinct card patterns.

The value network receives the encoded state and action, predicts the expected win rate, and selects the action with the highest value. It consists of an LSTM that processes the sequence of played cards, followed by six fully‑connected layers. Training is performed by self‑play: generate a game with the current network, record (state, action, reward) tuples, and update the network via gradient descent.

To accelerate data generation, DouZero employs a multi‑actor architecture with 45 parallel actors on a single four‑GPU server, feeding experience to a central trainer. This modest hardware requirement makes the approach accessible to most research labs.

Experimental results show that DouZero surpasses all previously published Dou Dizhu AIs (DeltaDou, CQN, rule‑based bots, and supervised‑learning models) on both Winning Percentage (WP) and Average Difference in Points (ADP). On the Botzone platform, DouZero ranked first among 344 bots. Training efficiency experiments demonstrate that DouZero exceeds the performance of DeltaDou within ten days and beats the supervised model in just two days.

When evaluated against human gameplay data, DouZero initially learns strategies similar to humans, but after five days it discovers novel tactics that outperform human play, indicating the emergence of super‑human behavior.

A case study of cooperative play illustrates that DouZero learns to coordinate between the two farmer agents, selecting actions that maximize joint success.

The authors have open‑sourced the Dou Dizhu simulation environment, training code, and an online demo platform, encouraging further research on Monte‑Carlo methods in sparse‑reward, large‑action‑space problems.

References: Jiang et al., “DeltaDou: Expert‑level Doudizhu AI through Self‑play” (IJCAI 2019); You et al., “Combinational Q‑Learning for Dou Di Zhu” (arXiv 2019); Sutton & Barto, “Reinforcement Learning: An Introduction” (MIT Press 2018); and others.

AIreinforcement learningCard GamesDou DizhuDouZeromonte carlo
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.