How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019
In a DTCC 2019 keynote, Zhao Qun, director of big‑data platform at Percent Point, outlines the challenges of trillion‑scale real‑time analytics and presents a transparent, fine‑grained architecture built on Kafka, Spark Streaming, ClickHouse, HBase, Ceph and Elasticsearch, detailing design principles, component sizing, multi‑center deployment, performance testing and operational safeguards.
Design Philosophy and Service Transparency
The platform adopts a transparent‑service, fine‑grained‑management approach. Each component’s storage, processing and query capacity is quantified, visualised and monitored against predefined thresholds. Alerts trigger proactive scaling or throttling, enabling stable operation under trillion‑scale workloads.
For example, Kafka is modelled with four operational stages: normal read/write, throttled flow with backlog, contention requiring scaling, and overload where service degrades. Thresholds for each stage are configured in the monitoring system.
Five Technical Dimensions
Dual‑center deployment : Cross‑data‑center data access is fully transparent to users; queries are routed without exposing the physical location of data.
Data storage : Structured logs exceed 100 TB/day with write throughput >2 M records/s; unstructured files reach 2 TB/day. End‑to‑end latency from ingestion to query must stay <30 s.
Offline processing : Heavy batch jobs must not impact real‑time query latency or write throughput.
Data query : Low‑latency, ad‑hoc queries across centers with full performance guarantees.
System operations : Rapid fault detection and automated recovery for hardware or network failures.
Typical Architecture for Ultra‑Large‑Scale Real‑Time Analytics
The solution consists of two layers:
Big‑data technology platform : Open‑source components are integrated as follows: Kafka – high‑throughput ingestion, buffering and peak‑load handling. Spark Streaming – real‑time processing with flow‑control and resource throttling. ClickHouse – columnar core storage enabling cross‑center trillion‑scale queries. Elasticsearch – distributed full‑text search. HBase + Ceph – custom OSS service for high‑concurrency file storage.
Data‑asset management platform : SaaS‑style visual development, metadata governance, data quality, lifecycle management and data‑product services (datasets, tags, knowledge graphs).
Component Selection – ClickHouse as Core Store
Benchmarking against Presto and HAWQ showed ClickHouse delivering superior query performance, higher concurrency handling and stable write throughput (up to 600 k rows/s per node). The following design choices were made:
Dual‑center ClickHouse cluster : Distributed tables span two data centers, providing transparent access with only ~25‑33 % performance overhead.
Prohibit distributed writes : Writes are directed to local tables via client‑side load balancing to avoid write stalls.
Parameter tuning : Adjust ZooKeeper‑managed replica settings, part‑merge thresholds and other knobs to keep the number of active Part objects within safe limits (≈150 for delay, ≈300 for failure).
Nginx load balancing : Centralized query entry points, query pre‑warming and PageCache caching for hot data.
Comprehensive query logging : Nginx logs are collected, visualised in Grafana and fed to Prometheus for alerting.
Stability is ensured by defining “danger” and “safe” zones based on write and query concurrency. ClickHouse creates a Part for each write; exceeding the part count thresholds triggers automatic throttling, preventing write stalls and ensuring long‑term reliability.
Similar flow‑control logic is applied to Spark Streaming, allowing dynamic scaling of executor counts to match required processing capacity.
Monitoring and Operations Stack
Component deployment is managed with Ambari. Service, network and hardware health are monitored by Zabbix. ClickHouse metrics (data distribution, write throughput, query concurrency, latency) are exported to Prometheus and visualised in Grafana. Alert thresholds align with the transparent‑service design, enabling rapid detection of overload or failure conditions.
Performance Summary
The platform processes >2 000 billion records per day, with peak loads exceeding 5 million operations per second. Real‑time ingestion, processing and query latency remain under the 30 s target, while write throughput stays stable at up to 600 k rows/s per ClickHouse node.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
