Artificial Intelligence 15 min read

AMAP-TECH Algorithm Competition: Dynamic Road Condition Analysis Using In-Vehicle Video

The AMAP‑TECH competition challenged participants to infer real‑time road conditions from in‑vehicle video, prompting the authors to combine lane‑wise vehicle detection with LightGBM and later an end‑to‑end DenseNet‑GRU model, augment data, ensemble five networks, and achieve a 0.7237 F1 score while outlining future deployment and research directions.

Amap Tech

Feb 1, 2021

Competition Overview

The AMAP-TECH algorithm competition, co‑hosted by AMap (Gaode Map) and Alibaba Cloud Tianchi, has concluded. The challenge was titled “Dynamic Road Condition Analysis Based on In‑Vehicle Video Images” and originated from real business scenarios of AMap. Road‑condition information is valuable for users, traffic management, and urban planning.

Problem Analysis

The task is to judge road‑condition status (e.g., congested, slow, free‑flow) from video frames captured by in‑vehicle cameras. The evaluation metric is a weighted F1 score. The preliminary round is a three‑class classification (congested, slow, free). The final round adds a “closed” road condition.

Key difficulties in the preliminary round include:

Occlusion of the road ahead by preceding vehicles.

Influence of opposite‑lane and roadside vehicles.

Camera angle variations due to crowdsourced installations, leading to inconsistent image perspectives.

In the final round, GPS timestamps were removed and the “closed” class increased visual complexity, demanding higher model robustness.

Algorithm Framework

We first give a macro overview of our solutions for the preliminary and final rounds.

Preliminary Round: Feature Engineering + LightGBM

Lane‑wise Vehicle Detection

Two approaches were explored. The first uses lane‑wise vehicle detection results as features for a LightGBM classifier. The detection model is Faster RCNN.

We annotated vehicles into five categories per lane:

Current lane front vehicle, same‑lane left vehicle, same‑lane right vehicle, opposite‑lane vehicle, roadside parked vehicle. This fine‑grained labeling enables extraction of lane‑specific vehicle counts, areas, distances, and dynamic features such as inter‑frame bounding‑box similarity.

From detection results we derived 60 features, including GPS time (when available), per‑lane vehicle counts, areas, distances, and temporal dynamics. The center‑point of a bounding box is computed as:

Distance between two points is calculated by:

Two weighted formulas were designed for dynamic features:

Box count vs. time interval: increase (positive rate) → score decrease; decrease (negative rate) → score increase.

Box size vs. time interval: size increase → score decrease; size decrease → score increase.

We incorporated Focal Loss into the LightGBM model to address class imbalance, achieving 6th place on the preliminary A‑leaderboard.

When switching to the B‑leaderboard, GPS time features diverged from A‑leaderboard distribution, causing many teams to fail. We visualized the distribution difference with violin plots:

A LightGBM model using only vehicle‑related features scored 0.6108 on the B‑leaderboard, confirming the large gap between A and B.

DenseNet‑Based End‑to‑End Classification

Recognizing that road conditions depend not only on vehicles but also on scene context, we shifted to an end‑to‑end image‑sequence model. Each frame (size 3) is processed by DenseNet‑121 to extract global features, which are then fed sequentially into a GRU to produce the final classification. Without data augmentation, this approach achieved first place on the B‑leaderboard with an F1 of 0.6614.

Final Round

Key highlights of our final‑round solution:

End‑to‑end classification network.

Sliding window + data expansion.

Extensive data augmentation.

Multi‑model ensemble.

Data Analysis and Pre‑Processing

We built offline training and validation sets, ensuring a realistic offline evaluation. Duplicate video sequences (e.g., id 1540 vs 1761) were detected using hash encoding and removed. This de‑duplication was crucial because many duplicates belonged to the “closed” class, which heavily influences the F1 score.

After removal, we split the data 4:1 for cross‑validation.

Data Expansion and Augmentation

The training set was imbalanced (e.g., only ~100 sequences for “slow” vs. >2000 for “closed”). For the minority classes we performed sequence‑level augmentation. Each sequence contains 3‑5 frames; we randomly sampled 3 frames to preserve temporal diversity, especially after GPS timestamps were removed.

Augmentation techniques included:

Translation, scaling, rotation to mitigate camera angle bias.

Contrast, brightness, color adjustments to simulate varying lighting and weather.

Motion blur, median filtering, Gaussian blur to emulate different image qualities.

Cut‑off operations to reduce reliance on privacy‑masked regions.

Algorithm Design: Spatio‑Temporal Feature Extraction

With the addition of the “closed” class, feature‑based methods required robust obstacle detection, but obstacle annotations were noisy. Therefore we focused on an end‑to‑end sequence detection model. After extracting frame‑level features, we employed a feature‑fusion module using a bidirectional GRU to capture temporal relationships.

In the fusion module we replaced the backbone with ResNeSt‑101 and SE‑ResNeXt, and added a dual‑attention network (position‑attention and channel‑attention) to calibrate inter‑channel relations and compute positional importance, respectively.

We trained the sequence classifier with the aforementioned augmentations using 5‑fold cross‑validation and ensembled five models via probability averaging. This solution achieved second place on the B‑leaderboard with an F1 of 0.7237.

Model Deployment and Outlook

In real‑world deployment, additional signals such as road hierarchy, GPS timestamps, and POI data can be embedded and fused with the visual model, enriching the spatio‑temporal feature space.

Future work includes improving obstacle and vehicle detection accuracy, adding pre‑ and post‑processing rule constraints, and applying model distillation and acceleration for edge deployment. Grid‑based city segmentation combined with crowdsourced data can further refine local traffic predictions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision feature engineering Deep Learning Model Deployment video classification road condition analysis traffic AI

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Competition Overview

Related Links

Problem Analysis

Algorithm Framework

Preliminary Round: Feature Engineering + LightGBM

Lane‑wise Vehicle Detection

DenseNet‑Based End‑to‑End Classification

Final Round

Data Analysis and Pre‑Processing

Data Expansion and Augmentation

Algorithm Design: Spatio‑Temporal Feature Extraction

Model Deployment and Outlook

Amap Tech

How this landed with the community

Was this worth your time?

0 Comments