Artificial Intelligence 13 min read

How AlphaGo’s Four‑Component Architecture Powers Master‑Level Go Play

This article breaks down AlphaGo’s four‑part system—policy network, fast rollout, value network, and Monte Carlo Tree Search—explaining their functions, training methods, and how they combine to achieve professional‑grade Go performance, while comparing them with the DarkForest implementation.

21CTO

Mar 13, 2016

How AlphaGo’s Four‑Component Architecture Powers Master‑Level Go Play

AlphaGo System Overview

AlphaGo consists of four main components: a policy network that predicts the next move, a fast rollout that sacrifices some move quality for speed, a value network that estimates win probabilities, and Monte Carlo Tree Search (MCTS) that integrates the three parts into a complete system.

Policy Network

The policy network takes the current board position as input and outputs a score for every possible move (361 points on a 19×19 board). DarkForest improves on this by training to predict three moves ahead, achieving quality comparable to reinforcement‑learning (RL) networks, though the final system uses a supervised‑learning (SL) network for better move diversity.

AlphaGo uses a relatively narrow (192‑unit) network for speed; a wider (384‑unit) network would likely be stronger if GPU resources allowed.

Fast Rollout

Fast rollout runs at microsecond speed, about 1,000× faster than the policy network, keeping the CPU busy while waiting for the network’s move. It also provides board evaluation by simulating games to the end, trading off simulation quality for quantity to improve overall strength.

AlphaGo implements fast rollout using local pattern matching and logistic regression, achieving 2 µs per move with a 24.2% move‑prediction accuracy, compared to 57% accuracy for the policy network at 2 ms.

Value Network

The value network estimates the win probability of the current position. Although it adds roughly 480 Elo points, the policy network contributes 800–1,000 points. Training requires 30 million self‑play games, sampling one position per game to avoid over‑fitting.

Surprisingly, AlphaGo does not use explicit local life‑and‑death analysis; the deep convolutional network learns to approximate these evaluations automatically.

Monte Carlo Tree Search (MCTS)

MCTS combines the three components, using a prior‑guided UCT that first expands moves favored by the policy network and later explores less‑promising moves as needed. DarkForest selects the top 3–5 policy moves for search, achieving similar performance.

AlphaGo expands leaf nodes only after a visit count threshold (e.g., 40), conserving GPU resources and improving leaf evaluation accuracy.

Summary and Insights

The success of AlphaGo stems from the systematic integration of deep learning components and traditional search, not from a single breakthrough. Reinforcement learning mainly supplies high‑quality training data rather than directly improving play. The system still relies heavily on massive data and computational resources.

deep learning Monte Carlo Tree Search AlphaGo value network fast rollout policy network

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.