Artificial Intelligence 10 min read

Reinforcement Learning with highway‑env: Installation, Configuration, and DQN Training in Python

This article demonstrates how to install and configure the highway‑env reinforcement‑learning environment, set up a DQN agent in Python, and train it on various traffic scenarios, providing code examples and performance visualizations.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Reinforcement Learning with highway‑env: Installation, Configuration, and DQN Training in Python

1. Install the environment – The gym library and the highway-env package are installed via pip:

<code>pip install gym</code>
<code>pip install --user git+https://github.com/eleurent/highway-env</code>

The package provides six traffic scenarios such as highway-v0 , merge-v0 , roundabout-v0 , parking-v0 , intersection-v0 , and racetrack-v0 . Documentation is available at highway‑env docs .

2. Configure the environment – After installation, a simple script creates the environment and renders a few steps:

<code>import gym
import highway_env
%matplotlib inline

env = gym.make('highway-v0')
env.reset()
for _ in range(3):
    action = env.action_type.actions_indexes['IDLE']
    obs, reward, done, info = env.step(action)
    env.render()
</code>

The rendered scene shows the ego vehicle (green) and surrounding traffic.

3. Data processing – highway‑env supplies three observation types:

Kinematics – a V×F matrix where V is the number of observed vehicles (including the ego) and F is the number of features (e.g., presence, x, y, vx, vy, cos_h, sin_h). The configuration example sets vehicles_count to 5 and defines the feature range.

Grayscale Image – a W×H gray‑scale picture of the scene.

Occupancy Grid – a W×H×F tensor representing the occupancy of each cell.

Example configuration for the Kinematics observation:

<code>config = {
    "observation": {
        "type": "Kinematics",
        "vehicles_count": 5,
        "features": ["presence", "x", "y", "vx", "vy", "cos_h", "sin_h"],
        "features_range": {"x": [-100, 100], "y": [-100, 100], "vx": [-20, 20], "vy": [-20, 20]},
        "absolute": False,
        "order": "sorted"
    },
    "simulation_frequency": 8,
    "policy_frequency": 2
}
</code>

4. Action space – The environment offers discrete meta‑actions:

<code>ACTIONS_ALL = {
    0: 'LANE_LEFT',
    1: 'IDLE',
    2: 'LANE_RIGHT',
    3: 'FASTER',
    4: 'SLOWER'
}
</code>

5. Reward function – All scenarios except parking share a common reward function defined inside the library; it can be modified only in the source code.

6. Build the DQN model – A simple fully‑connected network maps the flattened 5×7 Kinematics vector (size 35) to five discrete actions:

<code>import torch
import torch.nn as nn

class DQNNet(nn.Module):
    def __init__(self):
        super(DQNNet, self).__init__()
        self.linear1 = nn.Linear(35, 35)
        self.linear2 = nn.Linear(35, 5)
    def forward(self, s):
        s = torch.FloatTensor(s)
        s = s.view(s.size(0), 1, 35)
        s = self.linear1(s)
        s = self.linear2(s)
        return s
</code>

The surrounding DQN class implements experience replay, epsilon‑greedy action selection, and learning updates.

7. Training loop – The script repeatedly resets the environment, selects actions with a decaying epsilon, stores transitions, and calls learn() every 99 steps. Statistics such as average reward, episode time, and collision rate are recorded and plotted every 40 training iterations.

<code>dqn = DQN()
while True:
    done = False
    s = env.reset()
    while not done:
        e = np.exp(-count/300)
        a = dqn.choose_action(s, e)
        s_, r, done, info = env.step(a)
        env.render()
        dqn.push_memory(s, a, r, s_)
        if dqn.position != 0 and dqn.position % 99 == 0:
            loss = dqn.learn()
            count += 1
        s = s_
        # record reward, time, collision, etc.
</code>

8. Results – After training, the average collision rate decreases, episode duration grows, and the average reward improves, indicating that the agent learns to drive more safely in the simulated highway.

9. Conclusion – Compared with full‑scale simulators like CARLA, highway‑env offers a lightweight, game‑style abstraction that is convenient for algorithm prototyping, though it provides fewer knobs for low‑level control and sensor modeling.

simulationPythonreinforcement learningDQNgymhighway-env
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.