How Large Language Models Are Revolutionizing Ad Recommendation and Solving Cold‑Start Problems

This article explains how advertising recommendation is evolving from traditional feature‑engineered models to LLM‑driven pipelines, detailing data‑infrastructure challenges, semantic upgrades with multimodal embeddings, case studies in short‑video ads, user cold‑start prompt engineering, and future directions for generative recommendation systems.

DataFunSummit
DataFunSummit
DataFunSummit
How Large Language Models Are Revolutionizing Ad Recommendation and Solving Cold‑Start Problems

Background

Advertising recommendation faces a dual challenge of efficiency bottlenecks and user‑experience optimization. Traditional models rely on downstream user‑behavior data, leading to information cocoons, cold‑start issues, and an arms race of material creation among advertisers.

Three Topics Covered

Traditional recommendation models – meticulous data infrastructure.

LLM4REC – a semantic leap in data infrastructure.

Future planning for large‑model‑driven recommendation.

Traditional Recommendation Model

The classic ad‑delivery pipeline consists of three stages: pre‑delivery (creative selection and indexing), delivery (recall → coarse ranking → fine ranking → mixing → competition), and post‑delivery (closed‑loop data feedback). Models such as DIN and DIEN learn user‑item interaction histories and perform well when behavior data is abundant, but they suffer from self‑closed loops that exacerbate content and user cold‑start problems.

Pre‑delivery: advertisers upload assets, which are filtered and indexed through a multi‑stage selection process.

Delivery: algorithms match ads to users via recall, coarse ranking, fine ranking, mixing, and winning.

Post‑delivery: data is fed back to training pipelines, influencing earlier stages.

These closed loops cause content cold‑start (new items lack interaction data) and user cold‑start (inactive users lack interest tags), leading to a Matthew effect that narrows the audience pool.

LLM4REC – Semantic Leap

Large language models (LLMs) break the data‑closed loop by providing multimodal understanding and cross‑domain knowledge transfer. They enable semantic matching of ads and items, alleviating material duplication and improving cold‑start success rates.

Case Studies in Short‑Video Advertising

Case 1: Different titles for the same drama are identified as identical by the LLM.

Case 2: Re‑ordered episode clips are still recognized as the same storyline.

Case 3: Identical short‑video titles are correctly distinguished as different dramas.

Technical implementation extracts audio and text via OCR and ASR, generates embeddings, and uses a graph‑based graphframe to unify IDs. This approach achieved 97 % accuracy, a 13 % improvement over previous versions, and is applied in Kuaishou’s risk control and duplicate‑detection pipelines.

User Cold‑Start Solutions

To address sparse user behavior in advertising, LLM4REC constructs richer semantic features from multimodal signals (OCR, ASR, video embeddings) and expands them with similar high‑frequency item representations. These enriched features are fed into a three‑level prompt hierarchy:

System prompt: Role‑play as a commercial recommendation expert.

User prompt: Assemble all available user signals (platform activity, e‑commerce behavior, search logs).

Instruction prompt: Output ten likely product names (product‑level granularity) for the user.

The resulting semantic IDs are crossed with user behavior IDs, hashed, and concatenated to produce fine‑grained representations that improve AB‑test metrics in sparse‑signal scenarios.

Future Planning

The roadmap envisions a new workflow called ForRec that moves from raw data preparation to LLM inference, semantic retrieval, feature generalization, and model deployment. This pipeline aims to lower data‑collection costs, increase feature freshness, and automate evaluation loops.

In the generative recommendation era, models like One Rank will generate top‑N items directly from instruction prompts, potentially integrating new‑product promotion and adhering to ecosystem health constraints. Achieving this requires breakthroughs in latency (QPS), response time (RT), and real‑time data pipelines.

Overall, leveraging LLMs for both item and user cold‑start dramatically expands the reachable audience, improves conversion efficiency, and paves the way for next‑generation, knowledge‑driven recommendation systems.

LLMRecommendation Systemsmultimodalcold startAd Tech
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.