Artificial Intelligence 9 min read

How Deep Reinforcement Learning Shapes 15‑Minute City Community Planning

This article explains how a deep reinforcement learning model, built on a graph‑based representation of urban elements and trained with PPO, can automate land‑use and road planning to achieve Service, Ecology, and Traffic objectives for 15‑minute city neighborhoods.

Network Intelligence Research Center (NIRC)

Nov 27, 2023

How Deep Reinforcement Learning Shapes 15‑Minute City Community Planning

15‑Minute City Concept

Urban communities are hubs of innovation and opportunity, but traditional vehicle‑centric planning creates congestion, emissions, and unequal access to services. The COVID‑19 pandemic highlighted the fragility of long‑distance commuting, prompting a shift toward human‑centric designs where most daily needs are reachable within a 15‑minute walk or bike ride, fostering accessibility, equity, low carbon emissions, and resilience.

Urban Space Planning

City space planning has long relied on human designers iterating through analysis and discussion. To reduce this burden, tools have evolved from spreadsheets to GIS and, more recently, AI‑driven analysis. However, existing methods still depend on expert intuition for land‑use and road layout, tasks that are fundamentally combinatorial and suitable for algorithmic optimization.

Modeling

The study models a city as a proximity graph. Geographic elements are categorized as functional blocks (L), roads/boundaries (S), and intersections (J). These become graph nodes with features such as type, coordinates, width, height, length, and area. Edges represent adjacency between elements.

Land‑use elements are represented by polygons (e.g., vacant, residential, school, hospital, clinic, commercial, office, entertainment, park, open space). Roads are line strings, and intersections are points. The initial city state consists of all original parcels, roads, and junctions, with precise geometry stored in a geometry table.

For the Markov Decision Process (MDP), the state comprises three parts: the city proximity graph, the currently placed object, and planning statistics. A Graph Neural Network (GNN) encodes the graph, using message passing and neighbor aggregation so each node accesses information about its type, position, area, length, width, and height. The encoded node features are broadcast via graph convolution layers.

Reward Design

The reward is zero for all intermediate steps except the final evaluation of land‑use and road efficiency. It consists of three components:

Service : measures the community life‑circle index for the 15‑minute city.

Ecology : measures green space and park coverage.

Traffic : combines road density and connectivity.

State transitions describe layout changes: adding a new land‑use node alters the graph topology, while converting a boundary to a road changes node attributes.

Actions represent where to place a land‑use element or where to construct a new road segment. Land‑use planning is decomposed into three questions: which element type (handled by human expertise), where to place it (selected from L‑S edges), and how to place it (filtered by feasibility rules). Road planning selects a non‑road S node and converts it into a road.

Network Architecture and Training

The planning problem is solved in two stages: first land‑use placement, then road construction based on the resulting layout.

Three neural components are used:

Value network : takes the graph feature representation and planning statistics, passes them through fully‑connected layers, and outputs a scalar value.

Policy_land_use : inputs the city graph and current object, scores all candidate edges, applies softmax to obtain a probability distribution, and samples an edge as the action.

Policy_road : inputs node encodings, scores each node, applies softmax, and samples a node to be converted into a road.

Training uses Proximal Policy Optimization (PPO). The loss combines policy loss (with PPO clipping), entropy loss (to encourage exploration), and value loss (mean‑squared error). The three losses are weighted and summed to update the model parameters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep reinforcement learning Graph Neural Network PPO urban planning 15-minute city

Written by

Network Intelligence Research Center (NIRC)

NIRC is based on the National Key Laboratory of Network and Switching Technology at Beijing University of Posts and Telecommunications. It has built a technology matrix across four AI domains—intelligent cloud networking, natural language processing, computer vision, and machine learning systems—dedicated to solving real‑world problems, creating top‑tier systems, publishing high‑impact papers, and contributing significantly to the rapid advancement of China's network technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.