Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising
This paper presents the implementation and practice of generative retrieval based on Yuan large model in Tencent Advertising, addressing three key challenges: user intent capture, model alignment in advertising domain, and high-performance platform design under ROI constraints.
This paper presents the implementation and practice of generative retrieval based on Yuan large model in Tencent Advertising, addressing three key challenges: user intent capture, model alignment in advertising domain, and high-performance platform design under ROI constraints.
The overall architecture consists of three key components: intent capture based on user behavior and ad feedback, structured text expression of marketing objects with semantic ID indexing, and high-performance platform and engineering architecture. The system has been successfully deployed in Tencent Advertising's new ad placement (3.0), achieving 0.69% GMV normal improvement and 0.52% exposure conversion rate increase in WeChat Channels scenarios.
For intent capture, the paper explores prompt engineering, supervised fine-tuning (SFT), and direct preference optimization (DPO) techniques. The prompt design incorporates user basic attributes, interest summaries, and commercial behavior sequences, validated through offline feature gain analysis using HitRatio@K metric. The SFT training strategy includes main and auxiliary tasks, while DPO introduces commercial value and user interest preferences.
The paper introduces marketing objects as bridges between LLM and advertising systems, leveraging their structured information and stable lifecycle. Semantic ID indexing using RQ-VAE structure enables efficient retrieval, with four-layer codebook encoding achieving 4.06% collision rate. Constrained beam search with diversity strategies ensures generated results hit active marketing objects.
For platform and engineering, the paper details optimizations including TensorRT-LLM kernel improvements (softmax and finalize kernel), quantization techniques (smooth quant w8a8c8 and fp8 w8a8c8), and global load balancing through Redis-based request distribution. The training platform supports two-phase fine-tuning for semantic index tokens, while offline inference and training engineering address the challenges of LLM's high latency and resource requirements.
The implementation includes near-line inference for pre-estimating user interests, feature engineering for text-based prompts, and comprehensive diagnostic tools for model analysis. The system demonstrates significant improvements in advertising effectiveness while maintaining high performance and ROI constraints.
Tencent Advertising Technology
Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.