How Gaode Map Boosted Search Suggestions with Learning-to-Rank
Gaode Map revamped its search suggestion service by replacing rule‑based ranking with a Learning‑to‑Rank model, detailing challenges in sample construction, feature engineering, loss‑function tuning, and the resulting performance gains across millions of queries and diverse geographic regions.
Introduction
Information retrieval is the key technology for handling LBS big data and linking users, and search suggestions are an indispensable part of retrieval services. For example, entering “一点点” in Gaode Map quickly finds the store.
Search Suggestion Overview
Search suggestion (suggest service) automatically completes the user's query or POI during input, listing candidate completions and performing intelligent ranking.
Goals: intelligent prompts to reduce user input cost; characteristics: fast response, no complex query retrieval, essentially a simplified LBS information retrieval service.
Like general IR systems, suggest has recall and ranking stages. Ranking uses query‑doc textual relevance and doc features (weight, click) for weighted scoring.
With growing business and feature scale, rule‑based ranking becomes hard to maintain, leading to patchy code.
Challenges
Sample construction and model tuning are the two main challenges.
Learning to Rank (LTR)
LTR uses machine learning to solve ranking problems in retrieval systems. Common model: gbrank with pairwise loss.
Key difficulty: obtaining training samples.
Sample Construction
Manual labeling is infeasible due to massive POI candidates. Automatic methods based on click/no‑click pairs face issues: click over‑fitting, clicks not reflecting true satisfaction, top‑10 display limit, users who type full queries bypass suggestions.
Click over‑fitting.
Clicks may not reflect satisfaction.
Only top‑10 results are shown, limiting click data.
Some users prefer typing full queries, leaving no click data.
Thus, without click data, modeling is ambiguous, and clicks do not always indicate satisfaction.
Feature sparsity also challenges model learning; sparse features affect few samples and are often ignored, yet they are crucial for long‑tail cases.
Session‑Based Sample Construction
Consider the user's entire travel session rather than a single click. The session’s final click is propagated to all queries in the session, providing supervision for all candidates.
Steps:
Merge multiple server logs (suggest, search, navigation).
Split and clean sessions.
Propagate the click of the last query to all queries in the session.
Randomly sample over one million queries, recall top N POIs for each, generating tens of millions of effective samples for gbrank training.
Feature Design
Four modeling needs drive feature design:
Multiple recall pipelines (different cities, pinyin) require comparable features.
Dynamic target POI changes with user input, needing features that capture query‑dependent demand.
Low‑frequency long‑tail queries lack click‑based posterior features, requiring prior features.
Strong regional personalization in LBS services; use geohash to shard space and compute region‑specific statistics.
After designing features, apply feature engineering such as scaling, smoothing, position‑bias removal, and normalization.
Model Training and Issues
The initial model removed all rules and achieved ~5 MRR points improvement, but gbrank’s feature learning was uneven; only a few features (e.g., city‑click) were selected for splits.
Improving Feature Utilization
Two solutions were explored:
Oversample sparse and low‑frequency samples – simple but distorts data distribution.
Adjust the loss function: increase the gradient for under‑used features (e.g., query‑click) by adding a penalty term loss_diff, thereby raising their split gain in subsequent trees.
Formula (negative gradient of cross‑entropy loss with loss_diff) is shown below:
After loss adjustment and retraining, test‑set MRR improved by another 2 points, and historical ranking case coverage rose from 40% to 70%.
Conclusion
Applying Learning‑to‑Rank to Gaode’s search suggestion eliminated rule‑based coupling and patchwork, delivering clear performance gains. The gbrank model now satisfies ranking needs across query frequencies. Ongoing work includes personalization, deep learning, vector indexing, and user‑behavior sequence prediction.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
