How Alibaba Tackles the Massive Challenges of Time‑Series Data Storage
This article details Alibaba's middleware team's exploration of time‑series data characteristics, real‑world monitoring scenarios, the limitations of traditional databases, and the evolution of their custom HiTSDB solution that combines inverted indexing, high‑compression algorithms, and distributed aggregation to meet massive write and query demands.
Time Series Data Overview
Time‑series data consists of a series of numeric values distributed over time, where each record must include a value, not just a timestamp. Common examples include stock prices, advertising metrics (PV, UV), sensor readings, and IoT data, which together can dominate overall data volume.
Alibaba Use Cases
Alibaba's internal monitoring system "Eagle Eye" collects metrics such as CPU, memory, and per‑application QPS for millions of servers, generating peaks of 5.7 million points/second and averaging 3 million points/second. The system must handle both steady‑state sampling and service‑degradation scenarios that change sampling intervals.
Storage Challenges
Traditional relational databases (e.g., InnoDB) cannot sustain the required write throughput due to B‑tree index overhead and rapid storage growth (over 1 GB per second). Time‑series specific operations like interpolation and down‑sampling are inefficient in SQL because they require point‑by‑point processing.
Evaluated Solutions
Various alternatives were tested:
RocksDB‑based storage achieved ~200 k points/second but suffered from multi‑index overhead.
Elasticsearch provided fast indexing for small‑scale workloads but struggled with massive dimension cardinality.
Columnar stores (e.g., Druid) offered high compression but performed poorly for long‑range queries due to file‑level scanning.
Stream processing engines delivered high write speed but lacked flexible post‑hoc query capabilities.
Ultimately, a purpose‑built time‑series database was needed.
HiTSDB Evolution
Alibaba adopted OpenTSDB for its time‑partitioned storage, achieving ~20 bytes per data point. However, OpenTSDB exhibited several drawbacks: large in‑memory meta‑data, inefficient row‑scan queries, extra qualifier overhead, and single‑node aggregation.
Improvements introduced include:
Inverted indexing on time‑series identifiers to accelerate lookups.
Sharding with binlog replication to HDFS for high availability.
Pre‑aggregation of down‑sampled data to reduce query latency.
Integration of Facebook’s Gorilla compression algorithm, reducing a timestamped point to ~1.4 bytes, enabling several million points per second on modest hardware.
These enhancements, combined with a distributed aggregation engine, formed HiTSDB, which remains compatible with OpenTSDB protocols but features a completely redesigned storage and query layer.
Future Directions
Remaining challenges include handling divergent time‑series bursts, event‑driven versus periodic sampling imbalance, sub‑second sampling, SQL‑table interoperability, and efficient group‑by/top‑N queries. Plans involve adding configurable pre‑aggregation, cloud‑based archival storage, dual‑engine support for event‑driven and periodic data, and exploring FPGA‑based hardware acceleration for ultra‑high‑throughput ingestion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
