Build a DQN Autonomous Driving Agent with gym and highway‑env
This tutorial walks through installing gym and highway‑env, configuring six driving scenarios, processing observations (kinematics, images, occupancy grids), defining actions and rewards, constructing a DQN network, training it with a reinforcement‑learning loop, and analyzing collision, time, and reward metrics.
1. Install Environment
gym is a toolkit for developing and comparing reinforcement learning algorithms. Install gym and the highway‑env package:
pip install gym pip install --user git+https://github.com/eleurent/highway-envThe package provides six scenarios:
highway‑v0
merge‑v0
roundabout‑v0
parking‑v0
intersection‑v0
racetrack‑v0
Documentation: https://highway-env.readthedocs.io/en/latest/
2. Configure Environment
Example using the highway scenario:
import gym
import highway_env
%matplotlib inline
env = gym.make('highway-v0')
env.reset()
for _ in range(3):
action = env.action_type.actions_indexes['IDLE']
obs, reward, done, info = env.step(action)
env.render()Resulting simulation screenshot:
3. Data Processing
(1) State
highway‑env provides three observation types: Kinematics, Grayscale Image, Occupancy grid.
Kinematics
Outputs a V×F matrix where V is the number of observed vehicles (including the ego vehicle) and F is the number of features. Example matrix:
Values are normalized by default. Configuration example:
config = {
"observation": {
"type": "Kinematics",
"vehicles_count": 5,
"features": ["presence","x","y","vx","vy","cos_h","sin_h"],
"features_range": {
"x": [-100, 100],
"y": [-100, 100],
"vx": [-20, 20],
"vy": [-20, 20]
},
"absolute": False,
"order": "sorted"
},
"simulation_frequency": 8,
"policy_frequency": 2
}(2) Action
highway‑env defines five discrete meta‑actions:
ACTIONS_ALL = {
0: 'LANE_LEFT',
1: 'IDLE',
2: 'LANE_RIGHT',
3: 'FASTER',
4: 'SLOWER'
}(3) Reward
All scenarios except parking share the same reward function (illustrated below).
4. Build Model
The DQN network uses the Kinematics representation. Input size is 5 × 7 = 35, output size is 5 discrete actions.
import torch
import torch.nn as nn
import torch.nn.functional as F
import random
from collections import namedtuple
class DQNNet(nn.Module):
def __init__(self):
super(DQNNet, self).__init__()
self.linear1 = nn.Linear(35, 35)
self.linear2 = nn.Linear(35, 5)
def forward(self, s):
s = torch.FloatTensor(s)
s = s.view(s.size(0), 1, 35)
s = self.linear1(s)
s = self.linear2(s)
return s
class DQN(object):
def __init__(self):
self.net, self.target_net = DQNNet(), DQNNet()
self.learn_step_counter = 0
self.memory = []
self.position = 0
self.capacity = 100
self.optimizer = torch.optim.Adam(self.net.parameters(), lr=0.01)
self.loss_func = nn.MSELoss()
# methods choose_action, push_memory, get_sample, learn omitted for brevity
Transition = namedtuple('Transition', ('state', 'next_state', 'action', 'reward'))5. Training Loop
Initialize the environment with the same config, then run the DQN training loop, storing transitions, performing learning steps, and logging average reward, episode time and collision rate every 40 steps. Plots of these metrics are shown below.
6. Conclusion
Compared with the CARLA simulator, highway‑env offers a more abstract, game‑like environment that simplifies data acquisition and sensor modeling, making it convenient for end‑to‑end algorithm development, though it provides limited control over low‑level vehicle dynamics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
