How G‑Plan Transforms Map Recommendations with AI Agents and Multi‑Demand Planning
This article details how Gaode's G‑Plan combines large‑model AI agents, generative ranking, and spatiotemporal counterfactual DPO to model and prioritize multiple user intents on the home page, presents the system architecture, experimental setup, online gains, and ablation results, and explains how it moves recommendation from passive to proactive planning.
Background
Gaode (Amap) serves over one billion users and aims to evolve from a static map with passive routing to a dynamic, AI‑driven "live map" that can think and plan for users.
Problem Statement
The home‑page now displays several tool cards, each representing a potential user need at the same moment. Traditional recommendation treats this as a single‑item ranking problem, which cannot handle the simultaneous, inter‑related demands that users actually exhibit.
G‑Plan Overview
G‑Plan is a large‑model AI agent that performs holistic demand modeling and intent planning, turning fragmented needs into a coherent action sequence displayed on the home page.
Decision Challenges
1. Parallel demand prioritization: When multiple needs co‑exist, the system must decide which to show first.
2. Sequential demand planning: When needs are temporally or spatially linked, the system must order them sensibly.
Modeling Approaches
CTR‑Based Traditional Ranking
This baseline predicts click‑through‑rate for each card independently and sorts by the predicted score, assuming cards are independent.
Listwise Generative Card Ranking
A generative model fine‑tuned on Gaode’s global data treats the set of cards as a listwise ranking problem. For each card it extracts:
Description features: card type and textual description.
Statistical features: historical exposure, clicks, and other quality signals.
Content features: the concrete POI results returned for the current context.
The model learns to compare cards directly from user exposure‑click data, producing a full ordered list.
Experimental Results
The baseline sorts cards by UV‑CTR descending, ignoring personalization. Online A/B tests show:
Precision model: overall UV‑CTR +0.55%, pull‑up rate +0.02%, pull‑up state UV‑CTR +1.85%.
Listwise generative model: overall UV‑CTR +0.87%, pull‑up rate +0.11%, pull‑up state UV‑CTR +2.9%.
G‑Plan Intent Planning Agent
The agent follows a classic Perception‑Planning‑Action pipeline:
Memory: stores business knowledge, demand evolution patterns, and spatiotemporal context.
Planning: a deep reasoning module that generates an ordered intent sequence, providing both the recommendation and the logical rationale.
Tools: maps each planned intent to a concrete tool card with parameters (e.g., start point, tags) and triggers the tool to produce the final result.
The output must satisfy three constraints:
Logical consistency: consecutive intents must have causal or concurrent relationships.
Spatiotemporal feasibility: the sequence must obey physical time‑space constraints.
Executability: the result is a standard JSON structure usable by downstream services.
Data Synthesis Pipeline
A human‑in‑the‑loop process creates high‑quality training data:
High‑value trigger extraction: selects scenarios (e.g., out‑of‑city travel) likely to trigger multi‑step decisions.
Spatiotemporal slicing: samples across time slots (morning, lunch, night) and locations (airport, hotel, scenic spot).
Seed data construction: annotates selected samples for few‑shot prompts.
Teacher model generation: a trillion‑parameter LLM produces intent plans and full reasoning chains.
LLM‑as‑Judge filtering: automatic scoring plus manual checks retain only high‑quality pairs.
Implicit Chain‑of‑Thought (CoT) Distillation
To meet latency constraints, the explicit CoT reasoning of the teacher model is compressed into special tokens for a lightweight student model.
Two stages:
Stage 1 – Expert demonstration: teacher outputs a structured reasoning chain consisting of <CONTEXT>, <STRATEGY>, and sequential <STEP_n> blocks.
Stage 2 – Implicit distillation: during fine‑tuning, the natural‑language reasoning is progressively replaced by special tokens such as <T1>, <T2>, …, <Tn>. The student model learns to internalize the logical steps while emitting only a few tokens before the final intent list.
<THOUGHT>
<CONTEXT>Brief analysis of current spatiotemporal context and user needs</CONTEXT>
<STRATEGY>Core planning strategy derived from context</STRATEGY>
<STEP_1>Explain why the first intent is prioritized</STEP_1>
<STEP_2>Explain why the second intent follows</STEP_2>
...
</THOUGHT>Spatiotemporal Counterfactual DPO
To improve the model’s causal awareness of time and space, counterfactual samples are generated by perturbing timestamps or locations (e.g., changing 8:30 am to 23:30 pm). The teacher model creates new intent plans for these altered contexts, yielding positive preference pairs, while the original samples serve as hard negatives. Direct Preference Optimization (DPO) trains the model to prefer the correct plan under each spatiotemporal condition, enhancing:
Spatiotemporal sensitivity.
Out‑of‑distribution generalization.
Robustness against over‑reliance on historical demand patterns.
Case Study
Scenario: a user books a hotel in a different city and arrives at the airport at 15:00. Traditional ranking would show nearby food or attractions, but G‑Plan focuses on the primary intent (hotel) and then suggests nearby dining and leisure options, aligning with the user’s immediate travel plan.
Ablation Experiments
Four variants were evaluated offline:
Direct‑SFT: end‑to‑end mapping from context to intent sequence.
Post‑CoT‑SFT: model first outputs ranking then a reasoning chain (truncated at inference).
G‑Plan‑Distill: implicit CoT distillation with special tokens.
G‑Plan‑Distill‑DPO: adds spatiotemporal counterfactual DPO to the distillation pipeline.
Metrics included first‑intent accuracy and intent‑sequence similarity (weighted edit distance). Progressive improvements were observed from Direct‑SFT to G‑Plan‑Distill‑DPO, confirming the contribution of each component.
Conclusion
G‑Plan demonstrates that integrating AI‑agent capabilities, generative listwise ranking, implicit CoT distillation, and spatiotemporal counterfactual DPO can shift map‑app recommendation from reactive pointwise ranking to proactive, coherent intent planning, achieving higher online CTR gains while respecting latency constraints.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
