How to Build a Scalable Event Analytics Platform with ClickHouse
This article explains the design of a high‑performance event analysis platform that ingests billions of daily logs, supports event, funnel, and retention queries, and leverages ClickHouse for storage, efficient writes, and fast analytical queries across massive datasets.
Background
At Banfu, the servers collect billions of user‑behavior logs each day. The platform aims to answer business questions such as which channel generated the most registrations in the last three months, the age distribution of users who browsed picture books in Beijing last week, 7‑day retention of picture‑book users, and conversion rates at each step of the order path.
To answer these questions, an event analysis platform was built. The article first introduces its functions and then discusses architectural considerations.
Features
The platform supports event‑based metric aggregation, attribute grouping, and conditional filtering. Events are user actions like logging in, browsing a picture book, or purchasing a paid book. Three analysis types are provided: event analysis, funnel analysis, and retention analysis.
Event Analysis
Event analysis lets users specify a set of conditions and a target metric to answer a specific question. The conditions include:
Event type – the user action collected from custom‑instrumented logs (e.g., login, purchase).
Metric – basic metrics (total count, unique users, per‑user average) or custom metrics (event attribute combined with sum/average/max).
Filter – criteria to select the user segment of interest.
Dimension grouping – to compare groups.
Time range – the period during which the events occurred.
Example: "Average order amount for one‑on‑one courses in Beijing last week, broken down by age group". The parameters are event type = one‑on‑one course order, metric = average order amount, filter = Beijing, dimension = age, time range = last week.
Figure: Event analysis creation flow
Figure: Event analysis interface
Funnel Analysis
Funnel analysis examines conversion and drop‑off across multiple steps, such as login → browse picture book → purchase paid book. It also requires a window period during which the whole sequence must occur, and supports dimension grouping and time range.
Figure: Funnel analysis creation flow
Figure: Funnel analysis interface
Retention Analysis
Retention analysis defines an initial event and a subsequent event, then calculates the proportion of users who perform the subsequent event N days after the initial one, measuring user stickiness.
Example: set the initial event to app login, the subsequent event to picture‑book view, and the retention period to 7 days.
Figure: Retention analysis creation flow
Figure: Retention analysis interface
Architecture
The platform consists of two modules:
Data ingestion: client or server logs are sent to Kafka, processed by Flink for ETL, and written into ClickHouse.
Analysis query: the front‑end builds event, condition, and dimension selections, the back‑end assembles them into SQL, queries ClickHouse, and returns results to the front‑end.
Figure: Overall architecture diagram
How to Store Event Data in ClickHouse?
Data sources are event logs and user‑profile attributes. Core requirements are massive storage (billions of rows per day) and real‑time aggregation. ClickHouse was chosen for its columnar storage, efficient aggregation, and active community.
Tests show simple PV/UV queries on billions of rows return within 1 second, while complex retention or funnel queries finish within 10 seconds.
Because custom event attributes vary per event (e.g., order amount only appears on order events), storing each as a separate column would be costly. The solution is to keep stable attributes as columns and store frequently changing custom attributes in a Map column, combining the benefits of wide‑table performance and flexible schema.
How to Write Efficiently to ClickHouse?
The deployment uses a read‑write separation pattern: writes go to local tables, reads from distributed tables. There are three shards, each with double replication. Data is sharded by user_id to improve query performance.
Writing directly to distributed tables caused “Too many parts” errors and write amplification, so the platform avoids that.
Figure: Writing event logs into ClickHouse
How to Query ClickHouse Efficiently?
Event analysis can be expressed with plain SQL. Example:
SELECT
count(1) AS cnt,
toDate(toStartOfDay(toDateTime(event_ms))) AS date,
age
FROM event_analytics
WHERE event = "view_picture_book_home_page"
AND city = "beijing"
AND event_ms >= 1613923200000 AND event_ms <= 1614528000000
GROUP BY date, age;Retention analysis uses the retention function. Example:
SELECT
sum(ret[1]) AS original,
sum(ret[2]) AS next_day_ret,
sum(ret[3]) AS seven_day_ret
FROM (
SELECT user_id, retention(
event = "register_picture_book" AND toDate(event_ms) = toDate('2021-03-01'),
event = "view_picture_book" AND toDate(event_ms) = toDate('2021-03-02'),
event = "view_picture_book" AND toDate(event_ms) = toDate('2021-03-08')
) AS ret
FROM event_analytics
WHERE event_ms >= 1614528000000 AND event_ms <= 1615132800000
GROUP BY user_id
);Funnel analysis uses the windowFunnel function. Example:
SELECT
array(
sumIf(count, level >= 1),
sumIf(count, level >= 2)
) AS funnel_uv
FROM (
SELECT level, count() AS count
FROM (
SELECT
uid,
windowFunnel(172800000)(
event_ms,
event = "view_picture_book" AND event_ms >= 1613923200000 AND event_ms <= 1614009600000,
event = "buy_picture_book"
) AS level
FROM event_analytics
WHERE event_ms >= 1613923200000 AND event_ms <= 1614182400000
GROUP BY uid
)
GROUP BY level
);Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
