How Amap Uses AI to Automate Millions of User Feedback Reports
This article describes how Gaode Map leverages machine‑learning techniques—such as word2vec embeddings, LSTM networks, fine‑tuning, and confidence‑threshold ensembles—to automatically classify and verify massive user‑feedback intelligence, streamlining the multi‑step workflow from data collection to road‑map updates and dramatically improving efficiency.
1. Background
Gaode Map, a leading domestic travel‑big‑data company, receives massive amounts of user feedback (texts, images, videos) that are crucial for improving map services. The challenge is to efficiently process hundreds of thousands of daily reports.
Intelligence refers to any information (text, image, video) that helps solve specific navigation or map‑production problems. User feedback includes intelligence, suggestions, and complaints submitted via mobile or PC clients.
2. Problem and Solution
User feedback is reported through the Amap app or PC portal, selecting options (source, major type, sub‑type, road name) and providing a free‑text description. After submission, each report must be classified, located, and verified before the map data can be updated.
Intelligence recognition : tag the problem type by analyzing selected options and the free‑text description, and reviewing any attached images.
Intelligence positioning : determine the exact coordinates by checking the tap point, the vehicle’s location at the time of reporting, and the user’s navigation trajectory logs.
Intelligence verification : confirm the tag and location using imagery, heat‑maps, and road‑network data.
The manual rule‑based pipeline suffers from low accuracy, high skill requirements, and slow throughput.
3. Machine‑Learning Solution
3.1 Business Decomposition and Hierarchical Splitting
The workflow is broken into six layers: business‑level 1, 2, 3, intelligence recognition, intelligence positioning, and intelligence verification. Only the last three layers need partial human intervention; the upper layers can be fully automated.
3.2 Model Alignment
Feedback descriptions are the most valuable signals. They are categorized as valid (meaningful) or invalid (empty or nonsensical). Valid feedback undergoes multi‑level classification (data / product / forward), with further sub‑classification for data (road vs. topic). Invalid feedback follows a parallel path using separate models and ultimately relies on rules or manual handling.
3.3 Model Choice
Text is first vectorized. Traditional one‑hot tf‑idf suffers from sparsity, so word2vec embeddings are used to capture semantic similarity. For classification, deep learning models outperform handcrafted features. Recurrent Neural Networks (RNN) handle sequence data, while Long Short‑Term Memory (LSTM) mitigates gradient issues.
3.4 Model Architecture
Each feedback’s word‑vector sequence is fed into an LSTM. The final LSTM hidden state is concatenated with selected categorical features, passed through a fully connected layer, and classified with a softmax output.
4. Practical Experience
4.1 Fine‑tuning
Because labeled samples are scarce, a pre‑trained model is fine‑tuned on the intelligence‑recognition dataset, yielding ~3 % accuracy gains across various data sizes.
4.2 Hyper‑parameter Tuning
Initialize with SVD.
Apply dropout before LSTM (especially for bidirectional LSTM) to prevent over‑fitting.
Adam optimizer performed best (similar to RMSprop).
Batch size around 128, but 64 sometimes gives better results.
Always shuffle the training data.
4.3 Ensemble
Voting among the top 5 models (different hyper‑parameters) improves overall accuracy by ~1.5 %.
4.4 Confidence Thresholding
High‑confidence predictions are automated; low‑confidence ones are sent for manual review. A simple per‑class threshold strategy outperformed more complex confidence‑model approaches, and a top‑N recommendation list further reduces operator effort.
5. Results and Impact
5.1 Intelligence Classification
Product‑class accuracy > 96 %; data‑class recall ≈ 99 %.
Automation reduced manual workload by 80 % and cut per‑task cost to one‑fifth of the original.
5.2 Intelligence Recognition
Valid‑description accuracy > 96 % after applying confidence‑based routing, boosting operator efficiency by > 30 %.
6. Conclusion and Outlook
The project established a repeatable methodology for tackling complex business problems with NLP and deep learning, delivering substantial efficiency gains while maintaining high user satisfaction. Ongoing work focuses on further model refinement and extending the approach to other domains.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
