Artificial Intelligence 11 min read

How Gaode Map Boosted Search Suggestions with Learning-to-Rank

Gaode Map revamped its search suggestion service by replacing rule‑based ranking with a Learning‑to‑Rank model, detailing challenges in sample construction, feature engineering, loss‑function tuning, and the resulting performance gains across millions of queries and diverse geographic regions.

Alibaba Cloud Developer

Nov 27, 2019

How Gaode Map Boosted Search Suggestions with Learning-to-Rank

Introduction

Information retrieval is the key technology for handling LBS big data and linking users, and search suggestions are an indispensable part of retrieval services. For example, entering “一点点” in Gaode Map quickly finds the store.

Search Suggestion Overview

Search suggestion (suggest service) automatically completes the user's query or POI during input, listing candidate completions and performing intelligent ranking.

Goals: intelligent prompts to reduce user input cost; characteristics: fast response, no complex query retrieval, essentially a simplified LBS information retrieval service.

Like general IR systems, suggest has recall and ranking stages. Ranking uses query‑doc textual relevance and doc features (weight, click) for weighted scoring.

With growing business and feature scale, rule‑based ranking becomes hard to maintain, leading to patchy code.

Challenges

Sample construction and model tuning are the two main challenges.

Learning to Rank (LTR)

LTR uses machine learning to solve ranking problems in retrieval systems. Common model: gbrank with pairwise loss.

Key difficulty: obtaining training samples.

Sample Construction

Manual labeling is infeasible due to massive POI candidates. Automatic methods based on click/no‑click pairs face issues: click over‑fitting, clicks not reflecting true satisfaction, top‑10 display limit, users who type full queries bypass suggestions.

Click over‑fitting.

Clicks may not reflect satisfaction.

Only top‑10 results are shown, limiting click data.

Some users prefer typing full queries, leaving no click data.

Thus, without click data, modeling is ambiguous, and clicks do not always indicate satisfaction.

Feature sparsity also challenges model learning; sparse features affect few samples and are often ignored, yet they are crucial for long‑tail cases.

Session‑Based Sample Construction

Consider the user's entire travel session rather than a single click. The session’s final click is propagated to all queries in the session, providing supervision for all candidates.

Steps:

Merge multiple server logs (suggest, search, navigation).

Split and clean sessions.

Propagate the click of the last query to all queries in the session.

Randomly sample over one million queries, recall top N POIs for each, generating tens of millions of effective samples for gbrank training.

Feature Design

Four modeling needs drive feature design:

Multiple recall pipelines (different cities, pinyin) require comparable features.

Dynamic target POI changes with user input, needing features that capture query‑dependent demand.

Low‑frequency long‑tail queries lack click‑based posterior features, requiring prior features.

Strong regional personalization in LBS services; use geohash to shard space and compute region‑specific statistics.

After designing features, apply feature engineering such as scaling, smoothing, position‑bias removal, and normalization.

Model Training and Issues

The initial model removed all rules and achieved ~5 MRR points improvement, but gbrank’s feature learning was uneven; only a few features (e.g., city‑click) were selected for splits.

Improving Feature Utilization

Two solutions were explored:

Oversample sparse and low‑frequency samples – simple but distorts data distribution.

Adjust the loss function: increase the gradient for under‑used features (e.g., query‑click) by adding a penalty term loss_diff, thereby raising their split gain in subsequent trees.

Formula (negative gradient of cross‑entropy loss with loss_diff) is shown below:

After loss adjustment and retraining, test‑set MRR improved by another 2 points, and historical ranking case coverage rose from 40% to 70%.

Conclusion

Applying Learning‑to‑Rank to Gaode’s search suggestion eliminated rule‑based coupling and patchwork, delivering clear performance gains. The gbrank model now satisfies ranking needs across query frequencies. Ongoing work includes personalization, deep learning, vector indexing, and user‑behavior sequence prediction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation LBS Search Learning-to-Rank Feature-Engineering Machine-Learning

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.