Artificial Intelligence 17 min read

End-to-End Consistency Testing Solution for Click-Through Rate Models in Advertising Systems

The article describes Baidu’s end-to-end consistency testing framework for advertising click-through-rate models, which uses a five-stream verification pipeline and six implementation phases to compare Q-values across feature extraction, table conversions, and DNN computation, enabling precise detection and localization of data and model inconsistencies in production.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
End-to-End Consistency Testing Solution for Click-Through Rate Models in Advertising Systems

This article presents a comprehensive solution for testing the consistency between online and offline prediction in click-through rate (CTR) models used in advertising systems at Baidu.

Background: CTR models play a crucial role in advertising retrieval by estimating click probabilities, which are essential for ad ranking and truncation. The models consist of online prediction and offline training components. Due to system complexity including feature engineering, training loss, and storage precision loss, inconsistencies frequently occur between online and offline model behaviors, causing poor online performance despite good offline evaluation results.

Problem Definition: Consistency issues are categorized into data consistency (sample consistency and model consistency) and processing logic consistency. Data consistency refers to logical agreement where the same input dataset produces consistent values across different processing stages. Model consistency involves the conversion precision between offline trained models and online prediction models.

Technical Solution: The article proposes a five-stream verification approach covering feature extraction, large model lookup, and DNN computation. By comparing Q-values (predicted click probabilities) at different stages (q1 through q5), the system can identify inconsistencies and locate their exact positions:

q2 vs q3: Feature extraction consistency

q3 vs q4: Offline table to mid_pb conversion

q4 vs q5: mid_pb to online table conversion

q5 vs q1: DNN computation consistency

Implementation Steps: The solution includes six phases: traffic diversion, log formatting, log stitching, online parsing, Q-value replacement and calculation, and report generation. The system supports platform-based task submission and provides both statistical analysis and detailed diff reports.

Results: The solution has been successfully deployed within Baidu's advertising platform, supporting multiple model strategy issue investigations and identifying various inconsistency problems including feature inconsistencies and network structure misalignments.

Feature EngineeringCTR predictionDeep neural networksBaiduAdvertising SystemsMachine learning testingmodel validationOnline-offline consistency
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.