How openGemini’s New Columnar Engine Solves High‑Cardinality Time‑Series Challenges
This article explains why time‑series databases are ideal for massive telemetry data, describes the high‑cardinality problem that degrades performance, and shows how openGemini’s newly introduced columnar engine—combined with sorting and clustering indexes—effectively mitigates those issues while delivering impressive write and query speeds.
Why Choose a Time‑Series Database for Massive Telemetry Data?
Telemetry data can reach billions of records per day—e.g., nationwide smart meters generate 500 billion rows daily, and a fleet of 100,000 vehicles can produce about 1 PB of data. Traditional databases cannot handle such scale, prompting the rise of time‑series databases that specialize in high‑volume data storage and analysis.
High Cardinality: Definition and Impact
Cardinality refers to the number of unique values in a column. High cardinality means a column contains a very large number of distinct values, such as IP addresses reaching hundreds of millions or timestamps with high sampling rates. In high‑cardinality scenarios, the number of tag combinations and series IDs (SIDs) explodes, inflating inverted‑index structures, increasing maintenance and query overhead, and ultimately causing severe memory consumption and read/write performance degradation.
Increased memory usage
Reduced read/write throughput
openGemini’s Approach to High Cardinality
openGemini addresses the problem with a columnar storage engine, sorting, and a sparse (clustered) index. By removing the time‑line constraint and sorting selected tags and columns, data is stored column‑wise and indexed sparsely, making the index size independent of the time series length. This design dramatically improves both write and query performance.
Key Differences Between Columnar and Traditional Time‑Series Engines
Traditional time‑series engines use time‑line based clustering and inverted indexes, which grow with the number of series. The columnar engine sorts by specific columns, unrelated to the time line, and builds a clustered index that stores only the first record of each block, providing efficient filtering without time‑line‑dependent overhead.
Practical Guide to Using the Columnar Engine
openGemini has supported the columnar engine and Arrow Flight protocol since version v1.1.0. Documentation is available at https://docs.opengemini.org/zh/guide/features/high_series_cardinality.html . Configuration steps include setting up Arrow Flight, creating tables, and enabling the columnar storage mode.
Performance Validation
Benchmark results show that openGemini scales without limits on time‑line size, achieving up to 600 k rows / s / CPU core for writes. In four test scenarios with billions of concurrent time‑line queries, latency was as low as 0.012 s, outperforming competing products. Overall query latency remains very low even for full‑data aggregation.
Huawei Cloud Developer Alliance
The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
