Artificial Intelligence 7 min read

Dynamic Road Condition Analysis Using EfficientNet and ConvLSTM: Solution to the AMAP‑TECH Algorithm Competition

The winning solution for the AMAP‑TECH competition combines EfficientNet spatial features with ConvLSTM temporal modeling, adds a mask‑prediction branch for closed‑road obstacles, and mitigates severe class imbalance through a two‑stage decoupled training pipeline, achieving first place on leaderboard A and top‑5 on leaderboard B without ensembling.

Amap Tech

Jan 22, 2021

The AMAP‑TECH algorithm competition, co‑hosted by Amap (Gaode) and Alibaba Cloud Tianchi, focused on dynamic road‑condition analysis from in‑vehicle video images. The task originates from real‑world business scenarios, where accurate road‑status information benefits users, traffic management, and urban planning.

The competition defined four road‑condition categories—smooth, slow, congested, and closed—and used a weighted F1 score (weights 0.1, 0.2, 0.3, 0.4) as the evaluation metric. For the closed class, obstacle bounding‑box annotations were provided.

Core solution : a hybrid model that combines EfficientNet for spatial feature extraction with ConvLSTM for temporal sequence modeling. The extracted image features and sequence features are fused to perform classification.

To better handle the closed class, the provided bounding‑box labels were converted into mask labels, and an additional mask‑prediction branch was introduced, guiding the model’s attention toward obstacles.

Because the dataset suffers from severe class imbalance, the approach from the paper “Decoupling Representation and Classifier for Long‑tailed Recognition” was adopted. A two‑stage training pipeline was used: stage 1 trains the whole network with random sampling; stage 2 freezes EfficientNet and ConvLSTM, applies re‑weighting, and trains only the classifier, effectively mitigating the imbalance.

Additional tricks include using different sequence lengths for leaderboard A and B to avoid leakage (the closed class in training mostly contains 3‑frame sequences), modifying the ConvLSTM implementation by removing the tanh activation (replacing it with ReLU + GroupNorm and finally disabling the activation to preserve features for mask prediction), and applying online data augmentation consistently across frames.

The single model achieved first place on leaderboard A and ranked within the top 5 on leaderboard B without any model ensembling. The solution is simple, effective, and easily transferable to other tasks that involve bbox‑assisted classification or long‑tailed categories.

Overall, the method shows strong extensibility—additional information such as road level or time can be incorporated for end‑to‑end learning—contributing toward the broader goal of improving travel experiences.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI competition video classification EfficientNet road condition analysis class imbalance ConvLSTM

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.