Artificial Intelligence 9 min read

How to Analyze Changan Autonomous Driving Data: From Scene Recognition to Risk Assessment

This article walks through a complete data‑science pipeline for autonomous driving, covering CSV/JSON preprocessing with pandas, rule‑based scene classification, a weighted complexity‑risk model, and visual analytics using Matplotlib and Seaborn to identify high‑risk driving scenarios.

Data STUDIO

Jan 15, 2026

How to Analyze Changan Autonomous Driving Data: From Scene Recognition to Risk Assessment

1. Project Background and Competition Brief

The rapid growth of autonomous driving demands systematic analysis of massive sensor and high‑definition map data to evaluate complex traffic scenes. This case study reproduces the 2022 Global Chinese Student Data Innovation Competition task hosted by Changan Automobile, focusing on processing vehicle sensor logs and HD‑Map information to classify driving scenes, model scene complexity, and assess accident risk.

2. Data Exploration and Preprocessing

Raw data consist of multiple CSV files, each recording a continuous driving segment with dozens of fields such as ego vehicle state, target objects, lane markings, and HD‑Map data. Many fields are JSON strings embedded in CSV cells, requiring parsing into structured columns. The following Python snippet shows how pandas reads the CSV and extracts the road type from the HD‑Map JSON:

import pandas as pd
import json

file_path = '1659428125.53_1659428167.45.csv'
df = pd.read_csv(file_path)

def parse_hdmap_data(row):
    try:
        hdmap_str = row['link_list/hdmap']
        hdmap_data = json.loads(cleaned_hdmap_str)
        road_type = hdmap_data['links_0']['type']
        return road_type
    except (TypeError, json.JSONDecodeError):
        return None
# df['road_type'] = df.apply(parse_hdmap_data, axis=1)

This preprocessing step, though time‑consuming, converts unstructured data into a feature matrix ready for analysis.

3. Driving Scene Recognition and Classification

After preprocessing, scenes are first divided into two primary categories— urban road and highway —and then refined into sub‑scenes such as intersections, lane changes, ramps, etc. Classification relies on rule‑based keyword detection: the type field from the HD‑Map and the vehicle speed_limit_value distinguish highways from urban roads. A simplified classification function is shown below:

def classify_driving_scene(row):
    speed = row['velocity']
    road_type = row['road_type']  # assumed parsed from HD‑Map
    if road_type in ['highway', 'expressway'] or speed > 80:
        primary_scene = '高速公路'
    else:
        primary_scene = '城区道路'
    if row.get('is_at_intersection'):
        return f"{primary_scene} - 路口"
    if row.get('is_changing_lanes'):
        return f"{primary_scene} - 变道"
    return primary_scene

Figure 2 (scene‑classification hierarchy) and Figure 3 (keyword‑based classification flow) illustrate the taxonomy and technical route.

4. Scene Complexity and Risk Assessment

4.1 Complexity Model

Scene complexity is modeled as a weighted sum of three normalized components: vehicle‑state complexity fc(I), road‑geometry complexity fk(K), and environmental complexity fo(D). The overall score C is calculated as: C = w_i * f_c(I) + w_k * f_k(K) + w_d * f_o(D) where w denotes the weight of each dimension and f denotes the normalized metric.

4.2 Computation Workflow

The workflow extracts relevant fields, normalizes them, and applies time‑series analysis to produce a per‑timestamp complexity score (see Figure 5).

5. Visualization and Findings

Visualization is performed with Matplotlib and Seaborn. Figure 6 compares complexity and risk scores across scenes, highlighting that “pedestrian zones” and “intersections” exhibit the highest risk. Figure 7 aligns vehicle speed with risk index over time, showing sharp risk spikes when entering high‑risk sub‑scenes such as intersections or lane‑changes.

Key observations derived from the visual analysis:

Critical Scene Identification : Highway ramps, urban intersections, and dense pedestrian areas have the highest combined complexity and risk scores.

Influencing Factors : Besides ego speed, the number of surrounding objects, their dynamic behavior, and road geometry significantly affect risk.

Model Utility : The constructed scene‑recognition and risk‑assessment model can feed autonomous‑driving decision‑planning modules, potentially improving safety.

6. Conclusion and Outlook

The end‑to‑end case study demonstrates how raw autonomous‑driving logs can be transformed through data cleaning, feature engineering, rule‑based classification, quantitative complexity modeling, and visual analytics into actionable safety insights. The methodology is applicable to other multi‑dimensional time‑series datasets in the automotive domain.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Python data analysis risk assessment Autonomous Driving Pandas scene classification

Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.