Artificial Intelligence 10 min read

Build a DQN Autonomous Driving Agent with gym and highway‑env

This tutorial walks through installing gym and highway‑env, configuring six driving scenarios, processing observations (kinematics, images, occupancy grids), defining actions and rewards, constructing a DQN network, training it with a reinforcement‑learning loop, and analyzing collision, time, and reward metrics.

Python Programming Learning Circle

Jul 10, 2025

Build a DQN Autonomous Driving Agent with gym and highway‑env

1. Install Environment

gym is a toolkit for developing and comparing reinforcement learning algorithms. Install gym and the highway‑env package:

pip install gym

pip install --user git+https://github.com/eleurent/highway-env

The package provides six scenarios:

highway‑v0

merge‑v0

roundabout‑v0

parking‑v0

intersection‑v0

racetrack‑v0

Documentation: https://highway-env.readthedocs.io/en/latest/

2. Configure Environment

Example using the highway scenario:

import gym
import highway_env
%matplotlib inline

env = gym.make('highway-v0')
env.reset()
for _ in range(3):
    action = env.action_type.actions_indexes['IDLE']
    obs, reward, done, info = env.step(action)
    env.render()

Resulting simulation screenshot:

3. Data Processing

(1) State

highway‑env provides three observation types: Kinematics, Grayscale Image, Occupancy grid.

Kinematics

Outputs a V×F matrix where V is the number of observed vehicles (including the ego vehicle) and F is the number of features. Example matrix:

Values are normalized by default. Configuration example:

config = {
    "observation": {
        "type": "Kinematics",
        "vehicles_count": 5,
        "features": ["presence","x","y","vx","vy","cos_h","sin_h"],
        "features_range": {
            "x": [-100, 100],
            "y": [-100, 100],
            "vx": [-20, 20],
            "vy": [-20, 20]
        },
        "absolute": False,
        "order": "sorted"
    },
    "simulation_frequency": 8,
    "policy_frequency": 2
}

(2) Action

highway‑env defines five discrete meta‑actions:

ACTIONS_ALL = {
    0: 'LANE_LEFT',
    1: 'IDLE',
    2: 'LANE_RIGHT',
    3: 'FASTER',
    4: 'SLOWER'
}

(3) Reward

All scenarios except parking share the same reward function (illustrated below).

4. Build Model

The DQN network uses the Kinematics representation. Input size is 5 × 7 = 35, output size is 5 discrete actions.

import torch
import torch.nn as nn
import torch.nn.functional as F
import random
from collections import namedtuple

class DQNNet(nn.Module):
    def __init__(self):
        super(DQNNet, self).__init__()
        self.linear1 = nn.Linear(35, 35)
        self.linear2 = nn.Linear(35, 5)
    def forward(self, s):
        s = torch.FloatTensor(s)
        s = s.view(s.size(0), 1, 35)
        s = self.linear1(s)
        s = self.linear2(s)
        return s

class DQN(object):
    def __init__(self):
        self.net, self.target_net = DQNNet(), DQNNet()
        self.learn_step_counter = 0
        self.memory = []
        self.position = 0
        self.capacity = 100
        self.optimizer = torch.optim.Adam(self.net.parameters(), lr=0.01)
        self.loss_func = nn.MSELoss()
    # methods choose_action, push_memory, get_sample, learn omitted for brevity

Transition = namedtuple('Transition', ('state', 'next_state', 'action', 'reward'))

5. Training Loop

Initialize the environment with the same config, then run the DQN training loop, storing transitions, performing learning steps, and logging average reward, episode time and collision rate every 40 steps. Plots of these metrics are shown below.

6. Conclusion

Compared with the CARLA simulator, highway‑env offers a more abstract, game‑like environment that simplifies data acquisition and sensor modeling, making it convenient for end‑to‑end algorithm development, though it provides limited control over low‑level vehicle dynamics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

reinforcement learning DQN autonomous driving gym highway-env

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.