Artificial Intelligence 17 min read

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

This paper presents the implementation and practice of generative retrieval based on Yuan large model in Tencent Advertising, addressing three key challenges: user intent capture, model alignment in advertising domain, and high-performance platform design under ROI constraints.

Tencent Advertising Technology

Oct 14, 2024

Generative Retrieval Based on Yuan Large Model: Implementation and Practice in Tencent Advertising

The overall architecture consists of three key components: intent capture based on user behavior and ad feedback, structured text expression of marketing objects with semantic ID indexing, and high-performance platform and engineering architecture. The system has been successfully deployed in Tencent Advertising's new ad placement (3.0), achieving 0.69% GMV normal improvement and 0.52% exposure conversion rate increase in WeChat Channels scenarios.

For intent capture, the paper explores prompt engineering, supervised fine-tuning (SFT), and direct preference optimization (DPO) techniques. The prompt design incorporates user basic attributes, interest summaries, and commercial behavior sequences, validated through offline feature gain analysis using HitRatio@K metric. The SFT training strategy includes main and auxiliary tasks, while DPO introduces commercial value and user interest preferences.

The paper introduces marketing objects as bridges between LLM and advertising systems, leveraging their structured information and stable lifecycle. Semantic ID indexing using RQ-VAE structure enables efficient retrieval, with four-layer codebook encoding achieving 4.06% collision rate. Constrained beam search with diversity strategies ensures generated results hit active marketing objects.

For platform and engineering, the paper details optimizations including TensorRT-LLM kernel improvements (softmax and finalize kernel), quantization techniques (smooth quant w8a8c8 and fp8 w8a8c8), and global load balancing through Redis-based request distribution. The training platform supports two-phase fine-tuning for semantic index tokens, while offline inference and training engineering address the challenges of LLM's high latency and resource requirements.

The implementation includes near-line inference for pre-estimating user interests, feature engineering for text-based prompts, and comprehensive diagnostic tools for model analysis. The system demonstrates significant improvements in advertising effectiveness while maintaining high performance and ROI constraints.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Model Optimization Prompt engineering High-performance computing large language models Recommendation Systems advertising technology Generative Retrieval semantic indexing

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.