How to Analyze Changan Autonomous Driving Data: From Scene Recognition to Risk Assessment
This article walks through a complete data‑science pipeline for autonomous driving, covering CSV/JSON preprocessing with pandas, rule‑based scene classification, a weighted complexity‑risk model, and visual analytics using Matplotlib and Seaborn to identify high‑risk driving scenarios.
1. Project Background and Competition Brief
The rapid growth of autonomous driving demands systematic analysis of massive sensor and high‑definition map data to evaluate complex traffic scenes. This case study reproduces the 2022 Global Chinese Student Data Innovation Competition task hosted by Changan Automobile, focusing on processing vehicle sensor logs and HD‑Map information to classify driving scenes, model scene complexity, and assess accident risk.
2. Data Exploration and Preprocessing
Raw data consist of multiple CSV files, each recording a continuous driving segment with dozens of fields such as ego vehicle state, target objects, lane markings, and HD‑Map data. Many fields are JSON strings embedded in CSV cells, requiring parsing into structured columns. The following Python snippet shows how pandas reads the CSV and extracts the road type from the HD‑Map JSON:
import pandas as pd
import json
file_path = '1659428125.53_1659428167.45.csv'
df = pd.read_csv(file_path)
def parse_hdmap_data(row):
try:
hdmap_str = row['link_list/hdmap']
hdmap_data = json.loads(cleaned_hdmap_str)
road_type = hdmap_data['links_0']['type']
return road_type
except (TypeError, json.JSONDecodeError):
return None
# df['road_type'] = df.apply(parse_hdmap_data, axis=1)This preprocessing step, though time‑consuming, converts unstructured data into a feature matrix ready for analysis.
3. Driving Scene Recognition and Classification
After preprocessing, scenes are first divided into two primary categories— urban road and highway —and then refined into sub‑scenes such as intersections, lane changes, ramps, etc. Classification relies on rule‑based keyword detection: the type field from the HD‑Map and the vehicle speed_limit_value distinguish highways from urban roads. A simplified classification function is shown below:
def classify_driving_scene(row):
speed = row['velocity']
road_type = row['road_type'] # assumed parsed from HD‑Map
if road_type in ['highway', 'expressway'] or speed > 80:
primary_scene = '高速公路'
else:
primary_scene = '城区道路'
if row.get('is_at_intersection'):
return f"{primary_scene} - 路口"
if row.get('is_changing_lanes'):
return f"{primary_scene} - 变道"
return primary_sceneFigure 2 (scene‑classification hierarchy) and Figure 3 (keyword‑based classification flow) illustrate the taxonomy and technical route.
4. Scene Complexity and Risk Assessment
4.1 Complexity Model
Scene complexity is modeled as a weighted sum of three normalized components: vehicle‑state complexity fc(I), road‑geometry complexity fk(K), and environmental complexity fo(D). The overall score C is calculated as: C = w_i * f_c(I) + w_k * f_k(K) + w_d * f_o(D) where w denotes the weight of each dimension and f denotes the normalized metric.
4.2 Computation Workflow
The workflow extracts relevant fields, normalizes them, and applies time‑series analysis to produce a per‑timestamp complexity score (see Figure 5).
5. Visualization and Findings
Visualization is performed with Matplotlib and Seaborn. Figure 6 compares complexity and risk scores across scenes, highlighting that “pedestrian zones” and “intersections” exhibit the highest risk. Figure 7 aligns vehicle speed with risk index over time, showing sharp risk spikes when entering high‑risk sub‑scenes such as intersections or lane‑changes.
Key observations derived from the visual analysis:
Critical Scene Identification : Highway ramps, urban intersections, and dense pedestrian areas have the highest combined complexity and risk scores.
Influencing Factors : Besides ego speed, the number of surrounding objects, their dynamic behavior, and road geometry significantly affect risk.
Model Utility : The constructed scene‑recognition and risk‑assessment model can feed autonomous‑driving decision‑planning modules, potentially improving safety.
6. Conclusion and Outlook
The end‑to‑end case study demonstrates how raw autonomous‑driving logs can be transformed through data cleaning, feature engineering, rule‑based classification, quantitative complexity modeling, and visual analytics into actionable safety insights. The methodology is applicable to other multi‑dimensional time‑series datasets in the automotive domain.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
