Databases 7 min read

How openGemini’s New Columnar Engine Solves High‑Cardinality Time‑Series Challenges

This article explains why time‑series databases are ideal for massive telemetry data, describes the high‑cardinality problem that degrades performance, and shows how openGemini’s newly introduced columnar engine—combined with sorting and clustering indexes—effectively mitigates those issues while delivering impressive write and query speeds.

Huawei Cloud Developer Alliance

Nov 17, 2023

How openGemini’s New Columnar Engine Solves High‑Cardinality Time‑Series Challenges

Why Choose a Time‑Series Database for Massive Telemetry Data?

Telemetry data can reach billions of records per day—e.g., nationwide smart meters generate 500 billion rows daily, and a fleet of 100,000 vehicles can produce about 1 PB of data. Traditional databases cannot handle such scale, prompting the rise of time‑series databases that specialize in high‑volume data storage and analysis.

High Cardinality: Definition and Impact

Cardinality refers to the number of unique values in a column. High cardinality means a column contains a very large number of distinct values, such as IP addresses reaching hundreds of millions or timestamps with high sampling rates. In high‑cardinality scenarios, the number of tag combinations and series IDs (SIDs) explodes, inflating inverted‑index structures, increasing maintenance and query overhead, and ultimately causing severe memory consumption and read/write performance degradation.

Increased memory usage

Reduced read/write throughput

openGemini’s Approach to High Cardinality

openGemini addresses the problem with a columnar storage engine, sorting, and a sparse (clustered) index. By removing the time‑line constraint and sorting selected tags and columns, data is stored column‑wise and indexed sparsely, making the index size independent of the time series length. This design dramatically improves both write and query performance.

Key Differences Between Columnar and Traditional Time‑Series Engines

Traditional time‑series engines use time‑line based clustering and inverted indexes, which grow with the number of series. The columnar engine sorts by specific columns, unrelated to the time line, and builds a clustered index that stores only the first record of each block, providing efficient filtering without time‑line‑dependent overhead.

Practical Guide to Using the Columnar Engine

openGemini has supported the columnar engine and Arrow Flight protocol since version v1.1.0. Documentation is available at https://docs.opengemini.org/zh/guide/features/high_series_cardinality.html . Configuration steps include setting up Arrow Flight, creating tables, and enabling the columnar storage mode.

Performance Validation

Benchmark results show that openGemini scales without limits on time‑line size, achieving up to 600 k rows / s / CPU core for writes. In four test scenarios with billions of concurrent time‑line queries, latency was as low as 0.012 s, outperforming competing products. Overall query latency remains very low even for full‑data aggregation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Databases Columnar Storage high-cardinality openGemini time-series database

Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.