Artificial Intelligence 11 min read

Building a DQN‑based Autonomous Driving Agent with highway‑env in Python

This tutorial explains how to install the gym and highway‑env packages, configure the simulation environment, process state and action representations, implement a DQN network in PyTorch, and train the model while visualizing performance metrics for autonomous driving tasks.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Building a DQN‑based Autonomous Driving Agent with highway‑env in Python

1. Environment Installation

The gym library provides a toolkit for developing and comparing reinforcement‑learning algorithms; installing it and the highway‑env package (a collection of autonomous‑driving scenarios) is straightforward with pip.

<code>pip install gym</code>
<code>pip install --user git+https://github.com/eleurent/highway-env</code>

highway‑env includes six scenarios: highway‑v0 , merge‑v0 , roundabout‑v0 , parking‑v0 , intersection‑v0 , and racetrack‑v0 . Detailed documentation is available at the project's website.

2. Environment Configuration

After installation, the environment can be instantiated and configured. The example below creates a highway‑v0 environment and renders three steps using the default action set.

<code>import gym
import highway_env
%matplotlib inline

env = gym.make('highway-v0')
env.reset()
for _ in range(3):
    action = env.action_type.actions_indexes["IDLE"]
    obs, reward, done, info = env.step(action)
    env.render()
</code>

The rendered window shows the ego vehicle (green) and surrounding traffic. Many parameters of the Env class can be tuned; see the original documentation for details.

3. Model Training

3.1 Data Processing

highway‑env does not provide explicit sensors; observations are generated directly from the simulator. Three observation types are supported:

Kinematics : a V×F matrix where V is the number of observed vehicles (including the ego vehicle) and F is the number of features (e.g., presence, x, y, vx, vy, cos_h, sin_h). Values are normalized by default.

Grayscale Image : a W×H grayscale image representing the scene.

Occupancy Grid : a W×H×F tensor describing the occupancy of each cell around the ego vehicle.

For the tutorial we use the Kinematics representation. The configuration dictionary below selects five vehicles and seven features, defines feature ranges, and sets the observation to be relative to the ego vehicle.

<code>config = {
    "observation": {
        "type": "Kinematics",
        "vehicles_count": 5,
        "features": ["presence", "x", "y", "vx", "vy", "cos_h", "sin_h"],
        "features_range": {
            "x": [-100, 100],
            "y": [-100, 100],
            "vx": [-20, 20],
            "vy": [-20, 20]
        },
        "absolute": False,
        "order": "sorted"
    },
    "simulation_frequency": 8,  # Hz
    "policy_frequency": 2       # Hz
}
</code>

3.2 Action Space

The environment provides both continuous and discrete actions. The discrete set consists of five meta‑actions:

<code>ACTIONS_ALL = {
    0: 'LANE_LEFT',
    1: 'IDLE',
    2: 'LANE_RIGHT',
    3: 'FASTER',
    4: 'SLOWER'
}
</code>

3.3 Reward Function

All scenarios except parking share the same reward function (the exact formula is in the original documentation). The function can only be modified inside the source code; external weighting is the usual way to adjust it.

3.4 DQN Network Construction

Because the Kinematics observation yields a small 5×7 matrix (35 values), a simple fully‑connected network suffices. The network maps a 35‑dimensional input to five discrete actions.

<code>import torch
import torch.nn as nn

class DQNNet(nn.Module):
    def __init__(self):
        super(DQNNet, self).__init__()
        self.linear1 = nn.Linear(35, 35)
        self.linear2 = nn.Linear(35, 5)
    def forward(self, s):
        s = torch.FloatTensor(s)
        s = s.view(s.size(0), 1, 35)
        s = self.linear1(s)
        s = self.linear2(s)
        return s
</code>

A wrapper class DQN implements experience replay, epsilon‑greedy action selection, and periodic target‑network updates. Hyper‑parameters such as learning rate, batch size, and discount factor are defined at the top of the script.

3.5 Training Loop

The main loop repeatedly resets the environment, selects actions with decreasing exploration probability, steps the simulator, stores transitions, and triggers learning every 99 steps. After each training iteration, metrics such as average reward, episode duration, and collision rate are recorded and plotted.

<code>while True:
    done = False
    s = env.reset()
    while not done:
        e = np.exp(-count/300)
        a = dqn.choose_action(s, e)
        s_, r, done, info = env.step(a)
        env.render()
        dqn.push_memory(s, a, r, s_)
        if dqn.position != 0 and dqn.position % 99 == 0:
            loss = dqn.learn()
            count += 1
            if count % 40 == 0:
                # compute and plot average metrics
                ...
        s = s_
</code>

Plots show that the collision rate decreases as training progresses, while episode length tends to increase (episodes end early when a crash occurs).

4. Conclusion

Compared with the high‑fidelity CARLA simulator, highway‑env offers a more abstract, game‑like environment that enables rapid prototyping of reinforcement‑learning algorithms without dealing with sensor models or real‑time constraints. It is well‑suited for end‑to‑end algorithm testing, though its simplicity limits the range of controllable factors for detailed autonomous‑control research.

simulationPythonreinforcement learningDQNautonomous drivinghighway-env
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.