Artificial Intelligence 22 min read

How Baidu’s GRAB Model Uses Scaling Laws to Transform Ad Ranking

This article explains Baidu's generative ranking model GRAB, detailing how scaling laws from large language models inspire a new recommendation paradigm, the model's architecture, custom attention mechanisms, training strategies, deployment optimizations, and the resulting business gains in CTR and revenue.

DataFunSummit

Sep 9, 2025

How Baidu’s GRAB Model Uses Scaling Laws to Transform Ad Ranking

Introduction

Recent breakthroughs in generative AI, especially large language models (LLMs), have demonstrated the "Scaling Law" phenomenon, where model performance grows predictably with parameters, data, and compute. In the demanding ad recommendation scenario, traditional deep learning ranking models (DLRMs) face performance bottlenecks.

The Baidu Commercial Technology team designed and fully deployed a generative ranking model called GRAB (Generative Ranking for Ads at Baidu) to overcome these limits, covering problem diagnosis, paradigm exploration, framework design, technical challenges, and business outcomes.

1. Trend of Large‑Model Recommendation

Before GRAB, Baidu's ad recommendation relied on classic DLRMs that combine massive discrete features with MLPs. While effective, this paradigm hits a ceiling due to diminishing returns from feature engineering, lossy compression of sequential representations, weak reasoning, and low activation rates for dynamic ad scenarios.

2. "Scaling Law": A Breakthrough Insight

LLM research shows that loss decreases linearly as model size grows, suggesting that scaling up recommendation models could yield continuous gains. This insight motivated the exploration of large‑model approaches for recommendation.

3. Three Paths Explored for Large‑Model Recommendation

Path 1 – Direct LLM Recommendation: Directly applying a generic LLM to ad data failed, with performance dropping over a percentile.

Path 2 – LLM‑Enhanced Representations: Using LLMs to generate high‑quality feature embeddings improved generalization but offered limited short‑term gains.

Path 3 – Generative Sequential Modeling: Adapting LLM techniques (Transformer, long‑context) to model user behavior sequences end‑to‑end proved effective and led to the GRAB framework.

GRAB Overall Design

1. Core Design Philosophy

From "Separate" to "Unified": Model history behavior and target ad in a shared representation space, similar to LLM token modeling.

From "Flat" to "Structured": Transform user behavior into structured sequences handling variable length and hierarchy.

From "Manual" to "Adaptive": Feed raw user sequences directly, letting the model learn without handcrafted features.

From "Sequence Retrieval" to "Efficient Attention": Replace traditional hard‑search with causal Transformer attention for full‑sequence modeling.

2. Framework

GRAB treats the concatenated user history and candidate ad as a unified event sequence. Each event is tokenized via a GATE + MLP layer, then processed by a causal attention Transformer. The Transformer output passes through an MLP and Sigmoid to predict click‑through rate (CTR) for each ad slot.

3. Comparison with LLM and DLRM

GRAB shares the Transformer backbone with LLMs but focuses on user‑behavior tokens and discriminative learning rather than generative language objectives. Compared to DLRM, GRAB replaces handcrafted feature tables with end‑to‑end sequence modeling, achieving a full‑pipeline innovation.

Challenges and Solutions

1. Customized Attention Mechanism

Standard Transformer attention cannot directly handle recommendation’s complex interaction and temporal signals. The solution is Q‑Aware RAB (Query‑aware Relative Attention Bias) which combines causal masking, dual sliding windows (time and length), and query‑dependent relative biases.

2. Training Efficiency & Over‑fitting

Variable‑Length Zero‑Redundancy Packing: Pack multiple user sequences together with masks to improve GPU utilization.

Two‑Stage Training (STS): First stage learns end‑to‑end sequence autoregression; second stage trains sparse discrete representations, mitigating over‑fitting caused by user‑interest locality.

3. Inheriting the "Old Soup" Model

To warm‑start GRAB, static user attributes are encoded as heterogeneous tokens and combined only when needed, reducing redundancy. A dual‑loss training (original DLRM loss + GRAB sequence loss) enables smooth migration.

4. Efficient Online Inference

KV‑Cache: Cache key/value vectors of user history for fast per‑request inference.

System & Algorithm Optimizations: Use M‑Falcon packing, operator fusion, low‑precision computation, and cache‑aware serving to keep inference cost comparable to traditional models.

Business Impact

GRAB was fully deployed in Baidu’s ad ranking, delivering:

~0.003 % AUC lift.

~4 % revenue increase.

~5 % click‑through‑rate improvement.

Experiments also confirmed the scaling law: extending user sequence length from 64 to over 1024 yields near‑linear AUC growth, validating the long‑term potential of generative recommendation.

Future Outlook

The next generation of recommendation systems should combine broader knowledge, multimodal inputs, and rapid adaptation. Baidu envisions a path from rule‑based to hybrid to fully generative systems, where "recommendation large‑modelization" and "large‑model recommendation" converge.

Q&A Highlights

Key takeaways include the distinction between emergent phenomena and capabilities, the applicability of scaling laws to recommendation, practical training pipeline changes, handling heterogeneous tokens, and deployment strategies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CTR Prediction large language models Recommendation Systems Scaling Law Generative AI Baidu

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.