Training a DQN AI to Master 2048: Step-by-Step Guide

This article walks through using reinforcement learning with a Deep Q‑Network in PyTorch to train an AI agent that plays the 2048 puzzle game, covering environment setup, algorithm implementation, network design, and a short training run that achieves a score of 256.

Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Training a DQN AI to Master 2048: Step-by-Step Guide

As an enthusiastic gamer, the author decided to train an AI using reinforcement learning to play the classic 2048 puzzle game.

They used the open‑source gym‑2048 environment and implemented a Deep Q‑Network (DQN) with PyTorch, running the experiments on Huawei Cloud ModelArts.

Three main steps

1. Create the game environment

2. Build the DQN algorithm

3. Define the neural network model

The network is a simple three‑layer convolutional model that maps the 4×4 board to action values.

def learn(self, buffer):
    if buffer.size >= self.args.batch_size:
        if self.learn_step_counter % args.target_update_freq == 0:
            self.target_model.load_state_dict(self.behaviour_model.state_dict())
        self.learn_step_counter += 1
        s1, a, s2, done, r = buffer.get_sample(self.args.batch_size)
        s1 = torch.FloatTensor(s1).to(device)
        s2 = torch.FloatTensor(s2).to(device)
        r = torch.FloatTensor(r).to(device)
        a = torch.LongTensor(a).to(device)
        if args.use_nature_dqn:
            q = self.target_model(s2).detach()
        else:
            q = self.behaviour_model(s2)
        target_q = r + torch.FloatTensor(args.gamma * (1 - done)).to(device) * q.max(1)[0]
        target_q = target_q.view(args.batch_size, 1)
        eval_q = self.behaviour_model(s1).gather(1, torch.reshape(a, (a.size()[0], -1)))
        loss = self.criterion(eval_q, target_q)
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()
class ReplayBuffer:
    def __init__(self, buffer_size, obs_space):
        self.s1 = np.zeros(obs_space, dtype=np.float32)
        self.s2 = np.zeros(obs_space, dtype=np.float32)
        self.a = np.zeros(buffer_size, dtype=np.int32)
        self.r = np.zeros(buffer_size, dtype=np.float32)
        self.done = np.zeros(buffer_size, dtype=np.float32)
        self.buffer_size = buffer_size
        self.size = 0
        self.pos = 0
    def add_transition(self, s1, action, s2, done, reward):
        self.s1[self.pos] = s1
        self.a[self.pos] = action
        if not done:
            self.s2[self.pos] = s2
        self.done[self.pos] = done
        self.r[self.pos] = reward
        self.pos = (self.pos + 1) % self.buffer_size
        self.size = min(self.size + 1, self.buffer_size)
    def get_sample(self, sample_size):
        i = random.sample(range(0, self.size), sample_size)
        return self.s1[i], self.a[i], self.s2[i], self.done[i], self.r[i]
class Net(nn.Module):
    def __init__(self, obs, available_actions_count):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(obs, 128, kernel_size=2, stride=1)
        self.conv2 = nn.Conv2d(128, 64, kernel_size=2, stride=1)
        self.conv3 = nn.Conv2d(64, 16, kernel_size=2, stride=1)
        self.fc1 = nn.Linear(16, available_actions_count)
        self.relu = nn.ReLU(inplace=True)
    def forward(self, x):
        x = x.permute(0, 3, 1, 2)
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        x = self.fc1(x.view(x.shape[0], -1))
        return x

The training loop runs for a set number of episodes, resetting the environment each episode, selecting actions with the DQN, storing transitions in the replay buffer, and invoking the learning step. After about ten minutes of training the agent can consistently reach a score of 256.

The full source code and a ready‑to‑run notebook are available on Huawei Cloud ModelArts marketplace.

PythonAIDQN2048ModelArts
Huawei Cloud Developer Alliance
Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.