Industry Insights 6 min read

What Makes a Good CTR Benchmark? Lessons from Huawei’s FuxiCTR

The article analyzes the shortcomings of current click‑through‑rate benchmarks, explains why leaderboards are valuable, and proposes concrete criteria—including online evaluation, sequential test data, leakage prevention, and read‑only submissions—to build a more realistic and robust CTR benchmarking platform.

Baobao Algorithm Notes

Mar 23, 2022

What Makes a Good CTR Benchmark? Lessons from Huawei’s FuxiCTR

The article analyses what constitutes an effective benchmark for click‑through‑rate (CTR) prediction, motivated by Huawei’s open‑source framework FuxiCTR and the accompanying paper “Open Benchmarking for Click‑Through Rate Prediction”.

Why Leaderboards Matter

Uniform test‑set split with hidden labels prevents participants from over‑fitting to custom validation splits.

Allows the use of engineering tricks, so models are compared against a common baseline rather than hidden optimisations.

Enables thousands of teams to verify reproducibility, openness, and generalisation of reported results.

Leaderboard datasets are usually recent and large (often millions of instances), reducing the risk of label leakage or over‑optimization.

Evaluating on multiple datasets captures model variance across different domains.

CTR vs. CV/NLP Benchmarks

Academic CTR research typically relies on static offline datasets, whereas industrial CTR systems process billions of rows from dozens of tables and serve predictions online. Consequently, practitioners often find data quality and feature engineering more decisive than sophisticated model architectures, while many papers focus on adding attention or transformer components that provide limited practical gain.

Limitations of Existing CTR Benchmarks

Huawei’s benchmark evaluates several models on the Criteo and Avazu datasets. These datasets originate from competitions held many years ago and no longer reflect the scale, feature diversity, or freshness of modern advertising data, limiting their persuasive power.

Desired Characteristics of a Good CTR Benchmark

Provide an online evaluation environment together with extensive offline training data (multiple tables, heterogeneous features, and large sample size).

Deliver test data sequentially by time slice, mimicking real‑world traffic.

Within each time slice, enforce strict leakage prevention (e.g., filter consecutive user actions or allocate them to separate inference windows).

Make each test‑batch submission read‑only; participants cannot modify previously submitted results.

Recent advertising competitions, such as Tencent’s yearly ad contests, supply richer, more up‑to‑date data than the legacy Criteo/Avazu sets.

Illustrative Evaluation Loop

for test_batch in all_batch_test:
    test_feature = get_feature(test_batch)
    result = model.predict(test_feature)
    env.commit(result)

Benefits of This Setup

Simulates industrial online inference scenarios, including latency constraints and streaming data.

Prevents models from exploiting feature or temporal leakage.

Keeps the test set hidden, discouraging test‑set tuning.

Creates a shared benchmark that reduces self‑validation bias and facilitates fair comparison.

A concrete example of an online‑style competition is Kaggle’s “Riid Test‑Answer Prediction” challenge, which provides a simulated online prediction environment.

References (plain URLs): https://arxiv.org/pdf/2009.05794.pdf ; https://github.com/xue-pai/FuxiCTR ; https://www.kaggle.com/c/riiid-test-answer-prediction

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Advertising CTR Leaderboard online evaluation

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.