Big Data 17 min read

How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019

In a DTCC 2019 keynote, Zhao Qun, director of big‑data platform at Percent Point, outlines the challenges of trillion‑scale real‑time analytics and presents a transparent, fine‑grained architecture built on Kafka, Spark Streaming, ClickHouse, HBase, Ceph and Elasticsearch, detailing design principles, component sizing, multi‑center deployment, performance testing and operational safeguards.

ITPUB
ITPUB
ITPUB
How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019

Design Philosophy and Service Transparency

The platform adopts a transparent‑service, fine‑grained‑management approach. Each component’s storage, processing and query capacity is quantified, visualised and monitored against predefined thresholds. Alerts trigger proactive scaling or throttling, enabling stable operation under trillion‑scale workloads.

For example, Kafka is modelled with four operational stages: normal read/write, throttled flow with backlog, contention requiring scaling, and overload where service degrades. Thresholds for each stage are configured in the monitoring system.

Five Technical Dimensions

Dual‑center deployment : Cross‑data‑center data access is fully transparent to users; queries are routed without exposing the physical location of data.

Data storage : Structured logs exceed 100 TB/day with write throughput >2 M records/s; unstructured files reach 2 TB/day. End‑to‑end latency from ingestion to query must stay <30 s.

Offline processing : Heavy batch jobs must not impact real‑time query latency or write throughput.

Data query : Low‑latency, ad‑hoc queries across centers with full performance guarantees.

System operations : Rapid fault detection and automated recovery for hardware or network failures.

Typical Architecture for Ultra‑Large‑Scale Real‑Time Analytics

The solution consists of two layers:

Big‑data technology platform : Open‑source components are integrated as follows: Kafka – high‑throughput ingestion, buffering and peak‑load handling. Spark Streaming – real‑time processing with flow‑control and resource throttling. ClickHouse – columnar core storage enabling cross‑center trillion‑scale queries. Elasticsearch – distributed full‑text search. HBase + Ceph – custom OSS service for high‑concurrency file storage.

Data‑asset management platform : SaaS‑style visual development, metadata governance, data quality, lifecycle management and data‑product services (datasets, tags, knowledge graphs).

Architecture diagram
Architecture diagram

Component Selection – ClickHouse as Core Store

Benchmarking against Presto and HAWQ showed ClickHouse delivering superior query performance, higher concurrency handling and stable write throughput (up to 600 k rows/s per node). The following design choices were made:

Dual‑center ClickHouse cluster : Distributed tables span two data centers, providing transparent access with only ~25‑33 % performance overhead.

Prohibit distributed writes : Writes are directed to local tables via client‑side load balancing to avoid write stalls.

Parameter tuning : Adjust ZooKeeper‑managed replica settings, part‑merge thresholds and other knobs to keep the number of active Part objects within safe limits (≈150 for delay, ≈300 for failure).

Nginx load balancing : Centralized query entry points, query pre‑warming and PageCache caching for hot data.

Comprehensive query logging : Nginx logs are collected, visualised in Grafana and fed to Prometheus for alerting.

ClickHouse tuning diagram
ClickHouse tuning diagram

Stability is ensured by defining “danger” and “safe” zones based on write and query concurrency. ClickHouse creates a Part for each write; exceeding the part count thresholds triggers automatic throttling, preventing write stalls and ensuring long‑term reliability.

Similar flow‑control logic is applied to Spark Streaming, allowing dynamic scaling of executor counts to match required processing capacity.

Monitoring and Operations Stack

Component deployment is managed with Ambari. Service, network and hardware health are monitored by Zabbix. ClickHouse metrics (data distribution, write throughput, query concurrency, latency) are exported to Prometheus and visualised in Grafana. Alert thresholds align with the transparent‑service design, enabling rapid detection of overload or failure conditions.

Performance Summary

The platform processes >2 000 billion records per day, with peak loads exceeding 5 million operations per second. Real‑time ingestion, processing and query latency remain under the 30 s target, while write throughput stays stable at up to 600 k rows/s per ClickHouse node.

Monitoring dashboard
Monitoring dashboard
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureBig DataReal-time analyticsKafkaSpark Streaming
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.