Big Data 10 min read

How to Use ClickHouse Sampling and Materialized Views for Real‑Time Monitoring of Billion‑Scale Ad Traffic

This article explains how to handle high‑volume advertising monitoring by storing raw request logs in ClickHouse, enabling sampling and materialized views, and using TP999 metrics, aggregating tables, and Grafana queries to achieve fast, flexible, and low‑impact real‑time analytics on billions of events.

dbaplus Community
dbaplus Community
dbaplus Community
How to Use ClickHouse Sampling and Materialized Views for Real‑Time Monitoring of Billion‑Scale Ad Traffic

Introduction

In large‑scale advertising monitoring, engineers often need to store and query multi‑dimensional data at second‑level granularity, such as interface TP999 latency, request volume per scenario, or minute‑level revenue trends. As traffic grows, the monitoring engine faces increasing pressure.

TP999 Performance Metric

Average latency can hide performance outliers; for example, an interface may show 50 ms average while 10 % of requests take 90‑100 ms. Switching to TP999 (the 99.9th percentile) reveals these tail latencies but increases query load because every request must be retained.

Enabling Sampling in ClickHouse

ClickHouse’s high‑performance read/write and horizontal scalability allow storing every request in a detail table while using sampling to reduce query pressure. Sampling must be enabled at table creation with SAMPLE BY and the sampled column must also appear in the ORDER BY clause.

CREATE TABLE default.ods_nginx_access_log ON CLUSTER clickhouse (
    date Date,
    datetime DateTime,
    host String,
    port String,
    status String,
    cost_time UInt32,
    request_number UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/ods_nginx_access_log', '{replica}')
PARTITION BY date
ORDER BY (date, intHash64(datetime), host, port, status)
SAMPLE BY intHash64(datetime)
SETTINGS index_granularity = 8192;

Querying TP999 with Sampling

After enabling sampling, a 10 % sample can be queried to compute the 99.9th percentile latency:

SELECT
    datetime,
    quantile(0.999)(cost_time) AS tp999_latency
FROM default.ods_nginx_access_log
SAMPLE 0.1
GROUP BY datetime
ORDER BY datetime;

Materialized Views and AggregatingMergeTree for Multi‑Dimensional Queries

For high‑cardinality, multi‑dimensional analysis, a materialized view backed by AggregatingMergeTree pre‑aggregates data by chosen dimensions (e.g., host, port, status) and time granularity, dramatically reducing query load.

CREATE MATERIALIZED VIEW default.dw_nginx_access_log_1s ON CLUSTER clickhouse 
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{layer}-{shard}/dw_nginx_access_log_1s', '{replica}') 
PARTITION BY date 
ORDER BY (date, intHash64(datetime), host, port, status) 
SAMPLE BY intHash64(datetime) 
SETTINGS index_granularity = 8192 
AS SELECT 
    date,
    datetime,
    host,
    port,
    status,
    sumState(request_number) AS request_number 
FROM default.ods_nginx_access_log 
GROUP BY date, datetime, host, port, status;

Aggregating Query Example

Querying the materialized view yields per‑second request counts without scanning the raw table:

SELECT
    datetime,
    host,
    port,
    status,
    sumMerge(request_number) AS request_number
FROM default.dw_nginx_access_log_1s
GROUP BY datetime, host, port, status
ORDER BY datetime;

Grafana QPS Dashboard Queries

To visualize QPS per host in Grafana, the following queries can be used (today’s QPS, yesterday’s QPS, and seven‑day QPS). They rely on the materialized view and time‑bucket calculations.

-- Today QPS
SELECT (intDiv(toUInt32(datetime), $interval) * $interval) * 1000 AS t,
       sumMerge(request_number) AS "Today QPS"
FROM default.dw_nginx_access_log_1s
GROUP BY t
ORDER BY t;

-- Yesterday QPS
SELECT t + 86400000 AS t,
       count() AS "Yesterday QPS"
FROM (
    SELECT (intDiv(toUInt32(datetime), $interval) * $interval) * 1000 AS t,
           count()
    FROM default.dw_nginx_access_log_1s
    WHERE datetime >= toDateTime($from - 86400) AND datetime < toDateTime($to - 86400)
    GROUP BY t
) 
GROUP BY t
ORDER BY t;

-- Seven‑Day QPS
SELECT t + 86400000 * 7 AS t,
       count() AS "7‑Day QPS"
FROM (
    SELECT (intDiv(toUInt32(datetime), $interval) * $interval) * 1000 AS t,
           count()
    FROM default.dw_nginx_access_log_1s
    WHERE datetime >= toDateTime($from - 86400 * 7) AND datetime < toDateTime($to - 86400 * 7)
    GROUP BY t
) 
GROUP BY t
ORDER BY t;

Daily and Weekly Comparison

ClickHouse can also compute day‑over‑day and week‑over‑week request volume trends with simple SQL that groups by date and applies the same aggregation functions, enabling quick visual comparisons in dashboards.

Conclusion

ClickHouse’s fast read/write performance, rich function set, and flexible SQL—including sampling, materialized views, and AggregatingMergeTree —allow the Weibo advertising monitoring system to handle hundred‑billion‑scale traffic, provide low‑latency TP999 latency analysis, multi‑dimensional QPS dashboards, and seamless daily/weekly trend comparisons, greatly simplifying operational monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringClickHouseSamplingmaterialized viewbig-datatp999
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.