Artificial Intelligence 10 min read

Building and Training a DQN Agent with highway‑env for Autonomous Driving Simulation

This article explains how to install gym and highway‑env, configure the environment, process state, action and reward data, build a DQN model in PyTorch, run training loops, and analyze results for autonomous driving simulations using reinforcement learning.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Building and Training a DQN Agent with highway‑env for Autonomous Driving Simulation

1. Install Environment

Install the gym library and the highway‑env package (which provides six driving scenarios) via pip.

<code>pip install gym
pip install --user git+https://github.com/eleurent/highway-env</code>

2. Configure Environment

After installation, create a gym environment for the "highway‑v0" scenario and optionally adjust configuration parameters such as observation type, vehicle count, and feature ranges.

<code>import gym
import highway_env
env = gym.make('highway-v0')
# optional custom configuration
config = {
    "observation": {
        "type": "Kinematics",
        "vehicles_count": 5,
        "features": ["presence", "x", "y", "vx", "vy", "cos_h", "sin_h"],
        "features_range": {"x": [-100, 100], "y": [-100, 100], "vx": [-20, 20], "vy": [-20, 20]},
        "absolute": false,
        "order": "sorted"
    },
    "simulation_frequency": 8,
    "policy_frequency": 2
}
env.configure(config)</code>

3. Data Processing

State (Observation)

The environment can output observations in three formats: Kinematics (a V×F matrix), Grayscale Image, and Occupancy Grid. The example uses the Kinematics format, which returns a matrix of vehicle features (e.g., position, velocity, heading).

Action

Actions are either continuous (throttle and steering) or discrete. The discrete set includes five meta‑actions: LANE_LEFT, IDLE, LANE_RIGHT, FASTER, and SLOWER.

<code>ACTIONS_ALL = {
    0: 'LANE_LEFT',
    1: 'IDLE',
    2: 'LANE_RIGHT',
    3: 'FASTER',
    4: 'SLOWER'
}</code>

Reward

All scenarios except the parking scene share a common reward function defined inside the package; its weights can be adjusted externally.

4. Build the DQN Model

The DQN network is a simple feed‑forward neural network with an input size of 35 (5 vehicles × 7 features) and an output size of 5 (the discrete actions). The code uses PyTorch.

<code>import torch
import torch.nn as nn

class DQNNet(nn.Module):
    def __init__(self):
        super(DQNNet, self).__init__()
        self.linear1 = nn.Linear(35, 35)
        self.linear2 = nn.Linear(35, 5)
    def forward(self, s):
        s = torch.FloatTensor(s)
        s = s.view(s.size(0), 1, 35)
        s = self.linear1(s)
        s = self.linear2(s)
        return s

# DQN wrapper with replay memory, epsilon‑greedy policy, and learning routine omitted for brevity</code>

5. Training Loop

The agent interacts with the environment, selects actions using an epsilon‑greedy strategy, stores transitions in a replay buffer, and updates the network periodically. Statistics such as average reward, episode time, and collision rate are plotted every 40 training steps.

<code>while True:
    done = False
    s = env.reset()
    while not done:
        e = np.exp(-count/300)  # decay epsilon
        a = dqn.choose_action(s, e)
        s_, r, done, info = env.step(a)
        env.render()
        dqn.push_memory(s, a, r, s_)
        if dqn.position != 0 and dqn.position % 99 == 0:
            loss = dqn.learn()
            count += 1
            if count % 40 == 0:
                # compute and plot statistics
                ...
        s = s_
        reward.append(r)
    # record episode time and collision flag
    ...
</code>

6. Results and Discussion

Training curves show that the average collision rate decreases as training progresses, while episode duration tends to increase (episodes end early when a collision occurs). The abstracted highway‑env environment simplifies algorithm development compared with full‑scale simulators like CARLA, but offers fewer knobs for low‑level control research.

7. Conclusion

highway‑env provides a lightweight, game‑style platform for reinforcement‑learning research in autonomous driving, allowing rapid prototyping of DQN agents without dealing with sensor models or real‑world data acquisition.

simulationPythonreinforcement learningDQNautonomous drivinggymhighway-env
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.