Build a DQN Autonomous Driving Agent with gym and highway‑env

This tutorial walks through installing gym and highway‑env, configuring six driving scenarios, processing observations (kinematics, images, occupancy grids), defining actions and rewards, constructing a DQN network, training it with a reinforcement‑learning loop, and analyzing collision, time, and reward metrics.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Build a DQN Autonomous Driving Agent with gym and highway‑env

1. Install Environment

gym is a toolkit for developing and comparing reinforcement learning algorithms. Install gym and the highway‑env package:

pip install gym
pip install --user git+https://github.com/eleurent/highway-env

The package provides six scenarios:

highway‑v0

merge‑v0

roundabout‑v0

parking‑v0

intersection‑v0

racetrack‑v0

Documentation: https://highway-env.readthedocs.io/en/latest/

2. Configure Environment

Example using the highway scenario:

import gym
import highway_env
%matplotlib inline

env = gym.make('highway-v0')
env.reset()
for _ in range(3):
    action = env.action_type.actions_indexes['IDLE']
    obs, reward, done, info = env.step(action)
    env.render()

Resulting simulation screenshot:

highway simulation
highway simulation

3. Data Processing

(1) State

highway‑env provides three observation types: Kinematics, Grayscale Image, Occupancy grid.

Kinematics

Outputs a V×F matrix where V is the number of observed vehicles (including the ego vehicle) and F is the number of features. Example matrix:

kinematics matrix
kinematics matrix

Values are normalized by default. Configuration example:

config = {
    "observation": {
        "type": "Kinematics",
        "vehicles_count": 5,
        "features": ["presence","x","y","vx","vy","cos_h","sin_h"],
        "features_range": {
            "x": [-100, 100],
            "y": [-100, 100],
            "vx": [-20, 20],
            "vy": [-20, 20]
        },
        "absolute": False,
        "order": "sorted"
    },
    "simulation_frequency": 8,
    "policy_frequency": 2
}

(2) Action

highway‑env defines five discrete meta‑actions:

ACTIONS_ALL = {
    0: 'LANE_LEFT',
    1: 'IDLE',
    2: 'LANE_RIGHT',
    3: 'FASTER',
    4: 'SLOWER'
}

(3) Reward

All scenarios except parking share the same reward function (illustrated below).

reward function diagram
reward function diagram

4. Build Model

The DQN network uses the Kinematics representation. Input size is 5 × 7 = 35, output size is 5 discrete actions.

import torch
import torch.nn as nn
import torch.nn.functional as F
import random
from collections import namedtuple

class DQNNet(nn.Module):
    def __init__(self):
        super(DQNNet, self).__init__()
        self.linear1 = nn.Linear(35, 35)
        self.linear2 = nn.Linear(35, 5)
    def forward(self, s):
        s = torch.FloatTensor(s)
        s = s.view(s.size(0), 1, 35)
        s = self.linear1(s)
        s = self.linear2(s)
        return s

class DQN(object):
    def __init__(self):
        self.net, self.target_net = DQNNet(), DQNNet()
        self.learn_step_counter = 0
        self.memory = []
        self.position = 0
        self.capacity = 100
        self.optimizer = torch.optim.Adam(self.net.parameters(), lr=0.01)
        self.loss_func = nn.MSELoss()
    # methods choose_action, push_memory, get_sample, learn omitted for brevity

Transition = namedtuple('Transition', ('state', 'next_state', 'action', 'reward'))

5. Training Loop

Initialize the environment with the same config, then run the DQN training loop, storing transitions, performing learning steps, and logging average reward, episode time and collision rate every 40 steps. Plots of these metrics are shown below.

collision rate plot
collision rate plot
epoch time plot
epoch time plot
average reward plot
average reward plot

6. Conclusion

Compared with the CARLA simulator, highway‑env offers a more abstract, game‑like environment that simplifies data acquisition and sensor modeling, making it convenient for end‑to‑end algorithm development, though it provides limited control over low‑level vehicle dynamics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

reinforcement learningDQNautonomous drivinggymhighway-env
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.