Artificial Intelligence 11 min read

Balancing Personalization and Safety: LaD Model Boosts Real‑Time Search Query Generation

This article presents the LaD model, an end‑to‑end generative and detoxification framework for search query auto‑completion that hierarchically captures long‑ and short‑term user interests, achieves large AB test gains on Kuaishou, and has been accepted at KDD 2025.

Kuaishou Tech

Aug 7, 2025

Balancing Personalization and Safety: LaD Model Boosts Real‑Time Search Query Generation

Research Background

Query auto‑completion is a crucial feature of modern search systems, helping users finish their input quickly by predicting likely queries from prefix information. Traditional approaches rely on a query library built with prefix trees, inverted indexes, or ANN structures, which suffer from limited coverage for long‑tail prefixes.

Recent advances in large generative models offer a new way to break the query‑library bottleneck, but deploying them in large‑scale online search faces two major challenges: (1) uncontrolled real‑time generation that may produce low‑quality or toxic queries, and (2) increased inference latency due to longer inputs.

Key Challenges

Short user prefixes often lack sufficient context, requiring the incorporation of historical user behavior (short‑term and long‑term interests) to improve relevance. However, adding more behavior information increases model input length and threatens the strict latency budget of search modules.

Proposed Solution: LaD Model

The LaD (Long‑Short Interests Hierarchical Capturing + Adaptive Detoxification) model addresses both challenges with two components:

Long‑Short Interests Hierarchical Capturing : Short‑term interests are modeled at the token level, directly concatenated with the prefix for fine‑grained, low‑latency inference. Long‑term interests are encoded into a single embedding (sentence‑level) that can be cached offline, reducing online input length while preserving rich historical signals.

Adaptive Detoxification (Reject Preference Optimization) : A special [Reject] token is learned to separate high‑quality from low‑quality queries. During inference, Beam Search generates multiple candidates; those ranked above [Reject] are kept, while those below are filtered, enabling end‑to‑end generation and filtering without a separate discriminator.

The detoxification expert is trained with an online DPO loss that directly optimizes the preference ordering good query > [Reject] > bad query.

Experiments

Offline ablation studies on Kuaishou search data and the public AOL dataset show that:

Hierarchical interest modeling (SL‑37) improves R@4 from 26.11% to 31.70% compared to a non‑personalized baseline.

Adaptive detoxification reduces toxic query rate (UAmaxT) compared to prior DAC methods.

Online A/B tests on the Kuaishou app demonstrate substantial gains: click‑through rate, search PV, and playback duration all increase markedly, achieving the largest single‑experiment AB metric improvement in the past two years. The model is fully deployed, serving billions of users daily.

Deployment Options

Three deployment strategies were evaluated:

Nearline: offline generation to expand the query library, allowing larger models and offline quality filtering.

Gen + AD: online generation with adaptive detox, no personalization.

Gen + AD + LS: online generation with both adaptive detox and long‑short interest modeling, delivering the best performance.

Conclusion and Outlook

The LaD model represents the first successful end‑to‑end generative SUG system at Kuaishou, delivering large AB improvements while ensuring content safety. Future work will explore reinforcement‑learning reward models to further enhance ranking, and real‑time training to continuously incorporate fresh knowledge, aiming to replace the traditional recall‑ranking cascade with a fully end‑to‑end SUG pipeline.

Paper: https://arxiv.org/abs/2505.20966 Code: https://github.com/JXZe/LaD

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

generative models online inference personalized search detoxification KDD2025 LaD query auto-completion

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.