Real-time Customer Service Dashboard: Architecture and Implementation with Flink and ClickHouse
The article describes a real‑time customer‑service dashboard built on Flink for streaming MySQL changes captured via Kafka, which cleans and aggregates ~60 operational metrics before writing them to ClickHouse’s MergeTree/ReplacingMergeTree tables, enabling sub‑second queries and exactly‑once guarantees while separating offline and live pipelines.
The customer service team operates multiple systems (ticketing, call, IM) and requires real‑time visibility of operational metrics, which cannot be satisfied by offline T+1 reports.
About 60 key indicators have been identified and are categorized by their presentation form (fixed‑period, current‑time, table‑grouped) and by calculation complexity (real‑time, base, derived).
Key technical challenges include ensuring data accuracy, achieving low latency, handling a large number of diverse metrics, and meeting high backend performance requirements.
The guiding principle is to let the backend prepare data in advance so the front‑end does not need to recompute on each request.
Technical selection: For real‑time processing, Flink is chosen over Spark and Storm due to its true streaming model and exactly‑once guarantees. For storage, ClickHouse is used as an OLAP database offering sub‑second query latency.
Real‑time data flow: MySQL changes are captured via Kafka binlog, processed and cleaned by Flink, then written to ClickHouse. The data‑dashboard service queries ClickHouse to serve metrics to the front‑end.
ClickHouse table design uses MergeTree for offline data and ReplacingMergeTree for real‑time updates. Example DDL:
CREATE TABLE dashboard.local_table_test on cluster default (
`key` UInt32,
`order` UInt32,
`other` UInt32,
`part` DateTime,
`shard` UInt32,
INDEX order_idx(order) TYPE minmax GRANULARITY 10
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(part)
PRIMARY KEY key
ORDER BY (key,order)
SETTINGS index_granularity=2;
create table dashboard.dis_local_table_test on cluster default as dashboard.local_table_test ENGINE = Distributed(default, dashboard, local_table_test, shard);Flink jobs implement data cleaning using streams, windows (tumbling, sliding, session), watermarks, and state (operator and keyed). Checkpointing provides exactly‑once processing, and two‑phase commit ensures downstream sinks also preserve this guarantee.
Metric computation combines offline (T+1) aggregates with real‑time increments. Periodic metrics use Flink windows, cumulative metrics may be hybrid (offline + real‑time), and derived metrics are calculated from base metrics.
Storage strategy separates offline pipelines (using MergeTree) from real‑time pipelines (using ReplacingMergeTree) to optimize query performance and handle duplicate updates.
The solution has been applied to several dashboards (ticket, IM, phone) providing operators with up‑to‑second metric updates. Future work includes componentizing the data service layer and standardizing metric definitions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
