How to Use ClickHouse Sampling and Materialized Views for Real‑Time Monitoring of Billion‑Scale Ad Traffic
This article explains how to handle high‑volume advertising monitoring by storing raw request logs in ClickHouse, enabling sampling and materialized views, and using TP999 metrics, aggregating tables, and Grafana queries to achieve fast, flexible, and low‑impact real‑time analytics on billions of events.
Introduction
In large‑scale advertising monitoring, engineers often need to store and query multi‑dimensional data at second‑level granularity, such as interface TP999 latency, request volume per scenario, or minute‑level revenue trends. As traffic grows, the monitoring engine faces increasing pressure.
TP999 Performance Metric
Average latency can hide performance outliers; for example, an interface may show 50 ms average while 10 % of requests take 90‑100 ms. Switching to TP999 (the 99.9th percentile) reveals these tail latencies but increases query load because every request must be retained.
Enabling Sampling in ClickHouse
ClickHouse’s high‑performance read/write and horizontal scalability allow storing every request in a detail table while using sampling to reduce query pressure. Sampling must be enabled at table creation with SAMPLE BY and the sampled column must also appear in the ORDER BY clause.
CREATE TABLE default.ods_nginx_access_log ON CLUSTER clickhouse (
date Date,
datetime DateTime,
host String,
port String,
status String,
cost_time UInt32,
request_number UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/ods_nginx_access_log', '{replica}')
PARTITION BY date
ORDER BY (date, intHash64(datetime), host, port, status)
SAMPLE BY intHash64(datetime)
SETTINGS index_granularity = 8192;Querying TP999 with Sampling
After enabling sampling, a 10 % sample can be queried to compute the 99.9th percentile latency:
SELECT
datetime,
quantile(0.999)(cost_time) AS tp999_latency
FROM default.ods_nginx_access_log
SAMPLE 0.1
GROUP BY datetime
ORDER BY datetime;Materialized Views and AggregatingMergeTree for Multi‑Dimensional Queries
For high‑cardinality, multi‑dimensional analysis, a materialized view backed by AggregatingMergeTree pre‑aggregates data by chosen dimensions (e.g., host, port, status) and time granularity, dramatically reducing query load.
CREATE MATERIALIZED VIEW default.dw_nginx_access_log_1s ON CLUSTER clickhouse
ENGINE = ReplicatedAggregatingMergeTree('/clickhouse/tables/{layer}-{shard}/dw_nginx_access_log_1s', '{replica}')
PARTITION BY date
ORDER BY (date, intHash64(datetime), host, port, status)
SAMPLE BY intHash64(datetime)
SETTINGS index_granularity = 8192
AS SELECT
date,
datetime,
host,
port,
status,
sumState(request_number) AS request_number
FROM default.ods_nginx_access_log
GROUP BY date, datetime, host, port, status;Aggregating Query Example
Querying the materialized view yields per‑second request counts without scanning the raw table:
SELECT
datetime,
host,
port,
status,
sumMerge(request_number) AS request_number
FROM default.dw_nginx_access_log_1s
GROUP BY datetime, host, port, status
ORDER BY datetime;Grafana QPS Dashboard Queries
To visualize QPS per host in Grafana, the following queries can be used (today’s QPS, yesterday’s QPS, and seven‑day QPS). They rely on the materialized view and time‑bucket calculations.
-- Today QPS
SELECT (intDiv(toUInt32(datetime), $interval) * $interval) * 1000 AS t,
sumMerge(request_number) AS "Today QPS"
FROM default.dw_nginx_access_log_1s
GROUP BY t
ORDER BY t;
-- Yesterday QPS
SELECT t + 86400000 AS t,
count() AS "Yesterday QPS"
FROM (
SELECT (intDiv(toUInt32(datetime), $interval) * $interval) * 1000 AS t,
count()
FROM default.dw_nginx_access_log_1s
WHERE datetime >= toDateTime($from - 86400) AND datetime < toDateTime($to - 86400)
GROUP BY t
)
GROUP BY t
ORDER BY t;
-- Seven‑Day QPS
SELECT t + 86400000 * 7 AS t,
count() AS "7‑Day QPS"
FROM (
SELECT (intDiv(toUInt32(datetime), $interval) * $interval) * 1000 AS t,
count()
FROM default.dw_nginx_access_log_1s
WHERE datetime >= toDateTime($from - 86400 * 7) AND datetime < toDateTime($to - 86400 * 7)
GROUP BY t
)
GROUP BY t
ORDER BY t;Daily and Weekly Comparison
ClickHouse can also compute day‑over‑day and week‑over‑week request volume trends with simple SQL that groups by date and applies the same aggregation functions, enabling quick visual comparisons in dashboards.
Conclusion
ClickHouse’s fast read/write performance, rich function set, and flexible SQL—including sampling, materialized views, and AggregatingMergeTree —allow the Weibo advertising monitoring system to handle hundred‑billion‑scale traffic, provide low‑latency TP999 latency analysis, multi‑dimensional QPS dashboards, and seamless daily/weekly trend comparisons, greatly simplifying operational monitoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
