Scaling Baidu’s TSDB to Trillions of Points: Elastic, High‑Performance Architecture
Baidu’s TSDB processes over 20 million data points per second per node and tens of thousands of queries per second cluster‑wide by employing a stateless read/write‑separated elastic architecture, multi‑layer storage across Redis, HBase and Hadoop, minute‑level geo‑redundant self‑healing, and a modified Gorilla compression that cuts storage by 80% with minimal CPU overhead.
Background
Baidu’s monitoring TSDB processes >20 million data points per second on a single machine, handles tens of thousands of queries per second across the cluster, and moves tens of trillions of points daily. HBase provides the underlying storage performance, while the system architecture adds scalability and reliability.
Elastic Scalability
The service uses a read/write‑separated, stateless design. The Query‑engine (read) and Saver (write) components are identical instances that are load‑balanced by round‑robin or consistent hashing. Deployment runs on Baidu’s internal container platform Matrix , which allocates CPU/memory per instance and can launch or retire instances in minutes, enabling linear capacity growth simply by adding nodes.
Performance Optimizations
Horizontal partitioning slices HBase tables by time into separate slices . Older slices receive fewer queries, reducing load on the HBase cluster. The newest slice, which remains hot, is cached in Redis to lower query latency and off‑load HBase.
Cold historical data is periodically copied from the Saver pipeline into an independent Hadoop cluster. Queries for this data are routed to Hadoop, while Redis serves hot metrics, and HBase handles recent trend queries and cache‑miss traffic.
Cost‑Effective Compression
The system adopts and extends Facebook’s Gorilla time‑series compression algorithm. Key features:
Delta‑of‑Delta encoding for timestamps.
XOR‑based compression for floating‑point values.
Extended support for integer values and a StatisticsValue type that stores max, min, sum, and count per series.
Compressed byte streams are stored as key‑value pairs in Redis.
In production the adapted algorithm achieves roughly a 10× compression ratio, saving about 80 % of storage space while adding less than 10 % CPU overhead.
High Availability
TSDB achieves geo‑redundancy by deploying two full clusters in separate data centers. Writes are dual‑written to both clusters; reads are directed to one cluster via a dynamic routing table or Baidu Naming Service (BNS). A minute‑level self‑healing mechanism detects a single‑datacenter failure and automatically switches traffic, ensuring continuous service.
Future Extensibility
While HBase is the current storage engine, the modular architecture allows replacement with other back‑ends such as Cassandra, Elasticsearch, or MySQL. This flexibility enables the platform to adapt to diverse workload characteristics and to integrate with downstream analytics, anomaly detection, or AI pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
