Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili
We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.
Background
The rapid development of large language models (LLMs) is reshaping the workflow of advertisers and agencies in creating ad copy. To improve the efficiency of advertisers and the quality of ad titles on Bilibili, we leverage LLM technology and Bilibili commercial data to generate virtually unlimited creative titles from a few user‑provided keywords. The generated titles match Bilibili’s style, enhancing both efficiency and effectiveness.
Technical Practice
2.1 Evaluation Metric Design
To quantify model iteration direction and quality, we built a comprehensive evaluation system consisting of three dimensions: fluency, style score, and quality score. Fluency measures linguistic smoothness; style score assesses similarity to native Bilibili ad titles; quality score reflects whether a title meets general good‑title criteria (keyword relevance, click‑attractiveness, etc.). Each metric is modeled by a separate binary classifier trained on Bilibili data.
Fluency
Goal: Ensure generated titles are linguistically smooth.
Training data: 50k real Bilibili titles (balanced positive/negative). Negative samples are created by corrupting fluent sentences (character replacement, deletion, swapping).
Result: AUC > 0.98 on a 2k‑sample test set.
Style Score
Goal: Capture the subtle style differences between Qwen zero‑shot titles and authentic Bilibili ad titles.
Training data: 25k real Bilibili titles (positive) vs. 25k Qwen‑72B generated titles (negative).
Result: AUC > 0.95.
Quality Score
Goal: Identify high‑click‑rate titles using a pair‑wise labeling approach.
Positive samples: High‑CTR titles annotated by GPT‑4 for reasons (curiosity, emotional resonance, brevity, demand focus).
Negative samples: Titles judged poor by both GPT‑4 and Qwen‑72B.
Training data: ~10k positive/negative pairs.
Result: AUC > 0.88.
2.2 Dataset Construction and Cleaning
We construct a mixed dataset with a ratio of proprietary task data : commercial domain data : open‑domain data = 1 : 1 : N (5 ≤ N ≤ 10). Open‑domain data preserve zero‑shot capability; commercial data (Bilibili titles, video ASR, search queries) strengthen domain generalization. Proprietary task data consist of keyword‑to‑generated‑title and keyword‑to‑original‑title pairs, with strict keyword limits (≤2, ≤3, or unlimited). Cleaning follows the MoDS framework (quality, diversity, necessity filtering) and includes specialized steps such as quantitative digit replacement using Qwen‑72B and manual post‑processing to guarantee fluency.
2.3 Alignment Algorithm Exploration and Optimization
2.3.1 Supervised Fine‑Tuning (SFT)
Initial SFT improves business logic learning. Iterative prompt diversification (multiple prompts for the same task) further enhances robustness and generalization.
2.3.2 Direct Preference Optimization (DPO)
DPO treats preference scores as LLM probability distributions, eliminating the need for a separate reward model. It maximizes the likelihood of preferred completions while minimizing that of dispreferred ones, enabling a simple supervised‑learning‑style alignment.
2.3.3 DPO Variants and Optimizations
Prompt‑density optimization: In multi‑instruction scenarios, increasing information density mitigates rapid convergence and improves performance.
IPO (Inverse Preference Optimization): Adds an L2‑style regularization term to DPO’s loss, yielding modest gains.
KTO (Kinetic Theory of Optimality): Removes the need for paired samples, allowing weighted negative‑sample training and further quality improvements.
KTON: An enhanced KTO variant that strictly constrains negative‑sample behavior, improving quality scores while preserving fluency.
3 Product Entry
Users can input multiple keywords to generate titles; the UI supports repeated generation until satisfaction.
3.2 Title Association Feature
We added an associative‑title function that, based on ANN + Elasticsearch dual‑recall, suggests completions for short queries and semantically similar titles for long queries, dramatically increasing user efficiency.
4 Business‑Side Iterations
After launch, we refined the first‑screen recommendation by injecting account‑linked product entities and leveraging user‑entered queries, raising first‑screen adoption from ~20% to 45%.
4.2 Enhancing Title Novelty
Incorporate community‑generated titles to capture hot trends.
Weekly generation of new titles from recent creations.
RAG‑based meme extraction: Retrieve high‑frequency memes from comments and titles, then use few‑shot prompting to embed them into generated titles.
5 Online Results
The system now contributes to ~10% of daily newly created ad titles on Bilibili, with daily ad spend reaching tens of thousands of yuan, positioning the solution as industry‑leading.
6 Future Plans
Align offline evaluation metrics more closely with online CTR.
Increase generation diversity via temperature tuning and multi‑prompt strategies; explore “thousands‑of‑faces” personalized title generation.
Integrate Retrieval‑Augmented Generation (RAG) and Chain‑of‑Thought (CoT) to mitigate hallucinations and improve timeliness.
Build a commercial‑domain continued pre‑training foundation model and a Bilibili‑specific benchmark covering factual, reasoning, and marketing tasks.
Investigate agent‑driven data engineering to automate task‑specific data pipeline construction.
References
MoDS: Model‑oriented Data Selection for Instruction Tuning (arXiv:2311.15653).
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (arXiv:2305.18290).
RLHF and alternatives: IPO.
KTO: Model Alignment as Prospect Theoretic Optimization (arXiv:2402.01306).
Value‑Incentivized Preference Optimization (arXiv:2405.19320).
GitHub – netease‑youdao/QAnything.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.