AMAP-TECH Algorithm Competition: Dynamic Road Condition Analysis Using In-Vehicle Video
The AMAP‑TECH competition challenged participants to infer real‑time road conditions from in‑vehicle video, prompting the authors to combine lane‑wise vehicle detection with LightGBM and later an end‑to‑end DenseNet‑GRU model, augment data, ensemble five networks, and achieve a 0.7237 F1 score while outlining future deployment and research directions.
Competition Overview
The AMAP-TECH algorithm competition, co‑hosted by AMap (Gaode Map) and Alibaba Cloud Tianchi, has concluded. The challenge was titled “Dynamic Road Condition Analysis Based on In‑Vehicle Video Images” and originated from real business scenarios of AMap. Road‑condition information is valuable for users, traffic management, and urban planning.
Related Links
Exciting video! AMap’s first algorithm competition ends
AMAP-TECH Algorithm Competition Kick‑off
Runner‑up team shares: converting image information into multimodal understanding
Third‑place team shares: a simple, effective, and extensible solution
Teams from 15 countries and regions (880 teams) competed over several rounds. The champion was the “Flying Pig” team led by Zhu Yida, a third‑year PhD student at Beijing University of Posts and Telecommunications.
Problem Analysis
The task is to judge road‑condition status (e.g., congested, slow, free‑flow) from video frames captured by in‑vehicle cameras. The evaluation metric is a weighted F1 score. The preliminary round is a three‑class classification (congested, slow, free). The final round adds a “closed” road condition.
Key difficulties in the preliminary round include:
Occlusion of the road ahead by preceding vehicles.
Influence of opposite‑lane and roadside vehicles.
Camera angle variations due to crowdsourced installations, leading to inconsistent image perspectives.
In the final round, GPS timestamps were removed and the “closed” class increased visual complexity, demanding higher model robustness.
Algorithm Framework
We first give a macro overview of our solutions for the preliminary and final rounds.
Preliminary Round: Feature Engineering + LightGBM
Lane‑wise Vehicle Detection
Two approaches were explored. The first uses lane‑wise vehicle detection results as features for a LightGBM classifier. The detection model is Faster RCNN.
We annotated vehicles into five categories per lane:
Current lane front vehicle, same‑lane left vehicle, same‑lane right vehicle, opposite‑lane vehicle, roadside parked vehicle. This fine‑grained labeling enables extraction of lane‑specific vehicle counts, areas, distances, and dynamic features such as inter‑frame bounding‑box similarity.
From detection results we derived 60 features, including GPS time (when available), per‑lane vehicle counts, areas, distances, and temporal dynamics. The center‑point of a bounding box is computed as:
Distance between two points is calculated by:
Two weighted formulas were designed for dynamic features:
Box count vs. time interval: increase (positive rate) → score decrease; decrease (negative rate) → score increase.
Box size vs. time interval: size increase → score decrease; size decrease → score increase.
We incorporated Focal Loss into the LightGBM model to address class imbalance, achieving 6th place on the preliminary A‑leaderboard.
When switching to the B‑leaderboard, GPS time features diverged from A‑leaderboard distribution, causing many teams to fail. We visualized the distribution difference with violin plots:
A LightGBM model using only vehicle‑related features scored 0.6108 on the B‑leaderboard, confirming the large gap between A and B.
DenseNet‑Based End‑to‑End Classification
Recognizing that road conditions depend not only on vehicles but also on scene context, we shifted to an end‑to‑end image‑sequence model. Each frame (size 3) is processed by DenseNet‑121 to extract global features, which are then fed sequentially into a GRU to produce the final classification. Without data augmentation, this approach achieved first place on the B‑leaderboard with an F1 of 0.6614.
Final Round
Key highlights of our final‑round solution:
End‑to‑end classification network.
Sliding window + data expansion.
Extensive data augmentation.
Multi‑model ensemble.
Data Analysis and Pre‑Processing
We built offline training and validation sets, ensuring a realistic offline evaluation. Duplicate video sequences (e.g., id 1540 vs 1761) were detected using hash encoding and removed. This de‑duplication was crucial because many duplicates belonged to the “closed” class, which heavily influences the F1 score.
After removal, we split the data 4:1 for cross‑validation.
Data Expansion and Augmentation
The training set was imbalanced (e.g., only ~100 sequences for “slow” vs. >2000 for “closed”). For the minority classes we performed sequence‑level augmentation. Each sequence contains 3‑5 frames; we randomly sampled 3 frames to preserve temporal diversity, especially after GPS timestamps were removed.
Augmentation techniques included:
Translation, scaling, rotation to mitigate camera angle bias.
Contrast, brightness, color adjustments to simulate varying lighting and weather.
Motion blur, median filtering, Gaussian blur to emulate different image qualities.
Cut‑off operations to reduce reliance on privacy‑masked regions.
Algorithm Design: Spatio‑Temporal Feature Extraction
With the addition of the “closed” class, feature‑based methods required robust obstacle detection, but obstacle annotations were noisy. Therefore we focused on an end‑to‑end sequence detection model. After extracting frame‑level features, we employed a feature‑fusion module using a bidirectional GRU to capture temporal relationships.
In the fusion module we replaced the backbone with ResNeSt‑101 and SE‑ResNeXt, and added a dual‑attention network (position‑attention and channel‑attention) to calibrate inter‑channel relations and compute positional importance, respectively.
We trained the sequence classifier with the aforementioned augmentations using 5‑fold cross‑validation and ensembled five models via probability averaging. This solution achieved second place on the B‑leaderboard with an F1 of 0.7237.
Model Deployment and Outlook
In real‑world deployment, additional signals such as road hierarchy, GPS timestamps, and POI data can be embedded and fused with the visual model, enriching the spatio‑temporal feature space.
Future work includes improving obstacle and vehicle detection accuracy, adding pre‑ and post‑processing rule constraints, and applying model distillation and acceleration for edge deployment. Grid‑based city segmentation combined with crowdsourced data can further refine local traffic predictions.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.