Artificial Intelligence 22 min read

Applying Large Language Models to Search Advertising Satisfaction: From DNN to ERNIE and Prompt Learning

The article details how Baidu's Fengchao team leverages large language models, including a transition from DNN embeddings to ERNIE, introduces multi‑level tokenization and discrete core‑word inputs, and applies prompt learning and AIGC techniques to improve search advertising satisfaction and industry‑specific relevance modeling.

DataFunTalk

Aug 13, 2023

Applying Large Language Models to Search Advertising Satisfaction: From DNN to ERNIE and Prompt Learning

The presentation explains the gap between industrial and research practice, emphasizing that technology choices must directly address real business problems such as search advertising satisfaction, which measures how well ads meet user intent and client service quality.

It describes the traditional advertising CTR pipeline—log parsing, DNN embedding of massive sparse features, and training of a dense prediction model—and how the team migrated from DNN to the ERNIE language model to handle long, noisy landing‑page texts.

Key challenges include high‑noise, fragmented content and quadratic performance growth; conventional solutions (GPU migration, model distillation, pruning) were insufficient, leading to two efficiency measures: adapting discrete core‑word sets for sequence models and designing a multi‑level tokenization hierarchy that reduces token length while preserving semantics.

Prompt learning is introduced to achieve industry isolation: soft prompt tokens representing industry IDs are added to the model, masked during pre‑training, and injected during fine‑tuning, enabling the model to adapt to changing industry standards without sacrificing overall performance.

The talk also covers model architecture choices—single‑tower versus dual‑tower relevance models—and how virtual prompts can align dual‑tower training with single‑tower pre‑training objectives.

Finally, the potential of AIGC is discussed, including automated ad‑material generation, explainable debugging tools, and system‑level LLM reward models, all aimed at creating a virtuous loop that improves ad quality and business outcomes.

A Q&A segment clarifies implementation details such as how industry isolation is achieved, the role of prompts in training data, handling of long texts, and the comparative benefits of tokenization optimizations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models AIGC relevance modeling search advertising Baidu prompt learning

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.