Artificial Intelligence 18 min read

Sequence Optimization, Context-Aware CTR Re-Estimation, and Session-Level Auction for JD Advertising Ranking

The article presents JD's technical evolution for advertising ranking, covering technology selection for recommendation ad sorting, context‑aware CTR re‑estimation, reinforcement‑learning‑based sequence optimization, and a session‑level auction mechanism that together improve monetization efficiency and long‑term user value.

DataFunSummit

Mar 3, 2022

Sequence Optimization, Context-Aware CTR Re-Estimation, and Session-Level Auction for JD Advertising Ranking

Guest Speaker: Zhao Xin, Ph.D., Algorithm Engineer at JD, presented by DataFunTalk.

Overview: Improving ad monetization hinges on accurate traffic value estimation, especially click‑through‑rate (CTR). Since CTR varies with user, ad, and context, precise prediction across the entire ad sequence is essential.

The talk is organized around four topics:

Recommendation ad ranking technology selection

Context‑aware CTR re‑estimation

Reinforcement‑learning‑based sequence optimization

Session‑level ad auction mechanism optimization

01 Recommendation Ranking Status & Technology Selection

JD's ranking pipeline evolved from single‑item sorting to request‑level sequence optimization and finally to session‑level auction. Request‑level optimization moved from forward greedy search to generating and evaluating whole sequences.

Two main approaches for sequence optimization were discussed:

Forward greedy search (e.g., beam search) selects the most valuable item at each step using the previous item as context, but requires a model call per decision, leading to high latency.

Global sequence evaluation filters a candidate set of permutations using heuristic rules or a learned generator, then evaluates them with a global model, reducing online latency by evaluating many candidates in a single pass.

Comparative latency diagrams illustrate the trade‑offs between these methods.

02 Context‑Aware CTR Re‑Estimation

Context‑aware CTR estimation treats each newly placed SKU as the new context and re‑predicts CTR greedily. The model architecture is similar to a standard CTR model but incorporates only the immediate preceding items as context.

03 Reinforcement‑Learning‑Based Sequence Optimization

The solution consists of two online steps:

Train a sequence‑evaluation model using ranking‑based samples, then run a small‑traffic online selection that mixes random, heuristic, and generated sequences to collect shuffled‑order data for retraining.

Deploy an actor‑learning loop where a sequence‑generation model (trained with Monte‑Carlo sampling) learns to produce sequences favored by the evaluator. The generation model uses a PointDNN backbone to extract dense item features, applies attention over the whole sequence, and outputs per‑item CTR predictions that are combined with bid, diversity, and other business metrics.

Sequence generation details:

Candidate set is max‑pooled into a set‑level vector.

Each item vector is concatenated with the set vector and passed through several DNN layers.

Training uses a 2‑D softmax cross‑entropy loss; inference samples positions with a temperature‑controlled softmax, producing multiple candidate sequences.

RL iteratively improves both generator and evaluator. Three online monitoring metrics are tracked:

Accuracy of predicting the selected item for a given position.

Accuracy of predicting whether a specific item appears at a given position (row/column accuracy).

Online win‑rate of the sequence‑generation strategy.

All metrics show upward trends, confirming continuous improvement.

04 Session‑Level Ad Auction Mechanism Optimization

To address limitations of traditional two‑price bidding, JD introduces a learning‑based scoring system that is proportional to both the advertiser's bid and a platform‑wide composite score. A per‑slot auction selects the winning SKU based on these scores, enabling a two‑price settlement.

Key design principles:

Incentive compatibility – higher bids lead to higher display probabilities.

Sensitivity to platform multi‑objective and long‑term value.

Implementation simplicity for ranking and billing.

The fine‑grained auction uses a Mixer‑MLP backbone (suitable for set modeling) to produce per‑item, per‑position scores. Business rewards (immediate revenue, long‑term revenue, diversity, etc.) are fused into the loss function, turning the problem into a reward‑based 2‑D softmax optimization.

Challenges addressed:

Balancing the expressive power of a single‑actor model versus a generator‑plus‑evaluator pipeline.

Incorporating business priors into the training reward.

Solutions include scaling the actor to a larger Mixer‑MLP and integrating business‑derived rewards directly into the loss.

05 Q&A Highlights

• JD’s rerank approach is similar to Alibaba and Baidu but emphasizes a smooth, production‑ready engineering pipeline. • A “virtual bid” concept treats organic results as ads for unified ranking. • Multi‑metric fusion is handled via reward shaping in the auction mechanism. • RL is used for auction optimization, not for dynamic bidding; bids remain unchanged while the learning‑based score reallocates traffic. • Online impact: the sequence generation & evaluation framework raised RPM by ~15%; the session‑level auction added another 5‑6% with further potential gains.

Thank you for listening.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CTR reinforcement learning auction sequence optimization

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.