How Gaode’s G‑Action Uses Generative AI to Predict Users’ Next Move
Gaode’s G‑Action framework combines large‑language‑model pre‑training with fine‑tuned generative recommendation to predict a user’s immediate action and destination, transforming static map services into a dynamic, context‑aware experience and delivering measurable gains in click‑through and engagement metrics.
Background and Motivation
In the mobile‑internet era, map applications have evolved from simple navigation tools into comprehensive platforms that support travel, commerce, and lifestyle services. Gaode Map serves billions of daily interactions, ranging from route planning and point‑of‑interest (POI) searches to group‑buy purchases and reviews. Accurately anticipating a user’s immediate intent at app launch is critical for improving experience and service efficiency.
Problem Definition: "Guess You Action"
The "Guess You Action" (G‑Action) task predicts the user’s next action (A) and the corresponding destination POI (P) based on historical behavior, current time, location, and weather. Formally: A,P = f(User, Cur_Time, Cur_Loc, Cur_Weather) Action prediction must consider semantic relevance between the action and POI, not merely popularity. Metrics therefore shift from AUC to TOP‑1 accuracy.
Baseline and Limitations
The baseline ranks cards by UV‑CTR, ignoring personalized spatio‑temporal signals. This approach suffers from:
Separate recall and ranking stages that limit accuracy improvements.
CTR‑oriented models that overlook action‑action relationships and semantic coherence.
Insufficient utilization of time‑and‑location features.
Suboptimal alignment between actions and POIs.
Proposed Solution: G‑Action Generative Recommendation Framework
G‑Action leverages a large language model (LLM) pre‑trained on massive, anonymized app usage data (the "Spacetime‑GR" model) and further fine‑tuned for Gaode’s domain. The workflow is:
Encode multimodal context as a natural‑language prompt.
Use the LLM to generate the next Action and Poi‑Sid directly.
The prompt template (shown below) supplies historical check‑ins, frequent periodic check‑ins, current time, location, and weather:
你是一个推荐专家,请根据以下信息给出准确的推荐结果。
历史访问记录:{Check_Ins}
近90天高频访问:{Freq_Checkins}
当前时间:{Cur_Time}
当前位置:{Cur_Loc}
天气状况:{Cur_Weather}
请推荐该用户的下一个动作和目的地。
{Action}{Poi-Sid}Each historical record is formatted as:
在{time}{action}位于{loc}的{ttag}类{name}{sid}DPO Alignment
Positive samples are real user action‑POI pairs; negative samples are randomly sampled actions. The DPO loss emphasizes correct action prediction while keeping POI prediction unchanged, improving action accuracy without harming POI decoding.
Training and Deployment
Training uses one month of data, encoding each POI with three SIDs to balance token length and precision. The Qwen‑0.5B model serves as the backbone, achieving an average inference latency of ~50 ms, suitable for online deployment. To address severe class imbalance (e.g., few "order" or "review" actions), actions are down‑sampled or up‑sampled to obtain a balanced training distribution.
Offline Evaluation
Top‑1 Accuracy is the primary metric, evaluated on a balanced test set of 20 k samples per class. Sub‑metrics include:
Action Acc – correctness of the predicted action.
Token Acc – correctness of all three SID tokens for the POI.
Join Acc – joint correctness of both action and POI.
Results show the generative model outperforms traditional token‑based baselines, meeting the launch criteria.
Ablation Studies
Six experiments isolate the impact of input format, sample selection, and reinforcement learning:
Exp2 vs Exp1: Using natural‑language actions improves Action Acc by 8.1 %.
Exp3/Exp4 vs Exp1: Removing POI name/tag reduces Token Acc by 14 %.
Exp5 vs Exp1: Training on unbalanced data drops both Action and Token accuracy significantly.
Exp6 vs Exp1: DPO mainly boosts Action prediction, with minimal effect on POI.
Online Gains
Deploying G‑Action on "to‑store" and "invite‑review" cards yields:
+1.22 % UV‑CTR and +0.33 % pull‑up rate for to‑store cards.
+2.23 % UV‑CTR and +0.68 % pull‑up rate for invite‑review cards.
Case Study
A user at home plans a roast‑duck restaurant at 8 am and 12 pm. At 6 pm, the traditional pipeline still recommends route planning to the restaurant, while G‑Action recognizes dinner time and suggests the restaurant’s group‑buy coupon. At 9 pm, the baseline suggests a route home, but G‑Action predicts the user will write a review, prompting an invitation to do so. This demonstrates superior semantic coherence.
Conclusion and Future Work
G‑Action introduces a generative, LLM‑driven framework that jointly predicts user actions and destinations, achieving higher accuracy and richer semantic alignment than token‑based methods. Future directions include MoE architectures for multi‑objective decoupling, deeper exploration of action‑POI relationships, and extending the system toward a full "G‑Plan" that orchestrates entire user journeys.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
