How to Speed Up High‑Cardinality GroupBy Queries by Up to 8× in SLS
This article explains why high‑cardinality GroupBy queries are slow, describes SLS's underlying aggregation pipeline, and shows how adjusting session parameters and enabling high‑cardinality optimizations can reduce query times from dozens of seconds to just a few seconds across three real‑world test scenarios.
When analyzing data with an extremely large number of distinct values (high cardinality), traditional GroupBy operations can become a performance bottleneck, especially in operational analytics such as e‑commerce sales distribution, game player behavior tracking, or market trend monitoring.
Why High‑Cardinality GroupBy Is Slow
In most OLAP engines, data is hashed and distributed across nodes for parallel aggregation. The pipeline typically follows four stages: DataSource → PartialAgg → FinalAgg → Output. While PartialAgg and DataSource run together, FinalAgg is limited by the number of shards (default max 20), causing severe slowdown when the distinct key count reaches hundreds of millions or billions.
Implementation Details in SLS
SLS (Log Service) adopts the same distributed hash‑based GroupBy but adds two tunable session parameters: hash_partition_count – controls the parallelism of the FinalAgg stage (default max 20, can be increased up to 200). high_cardinality_agg – switches to a special aggregation path optimized for extremely high‑cardinality datasets.
Test Setup
Three test cases were built using simulated Nginx access logs stored in a SLS Logstore with 5000 CU allocation.
{
RequestId: varchar, /* each request ID is globally unique */
ClientIP: varchar,
Method: varchar,
Latency: int,
Status: int,
...
}The cases cover:
High‑cardinality single‑column aggregation: 2.8 billion rows grouped by RequestId (≈2.8 billion distinct values).
High‑cardinality multi‑column aggregation: 4.5 billion rows grouped by ClientIP, Status, Latency (≈1.5 billion distinct combinations).
Low‑cardinality value aggregation: 1.5 trillion rows grouped by Latency (≈7.35 million distinct values) to compute Top‑100 frequencies.
Performance Results
Case 1 – Single‑Column High Cardinality
Baseline (plain SQL) took ~17 s. Enabling enhanced SQL with hash_partition_count=40 reduced it to 10 s. Raising the count to 64/128/200 further cut the time to 7 s, 4.5 s, and 3.7 s respectively. Finally, turning on high_cardinality_agg=true brought the query down to ~2.1 s, an 8× speedup.
Case 2 – Multi‑Column High Cardinality
Baseline took ~24 s. With hash_partition_count=40 it fell to 11 s; increasing the count to 64/128/200 yielded 7.3 s, 5.9 s, and 5.8 s. Enabling high_cardinality_agg=true reduced the time to ~2.9 s, again achieving roughly an 8× improvement.
Case 3 – Low Cardinality Aggregation
Baseline query finished in 4.3 s but was truncated due to data size. Enhanced SQL produced an accurate result in 23.4 s. Raising hash_partition_count had little effect (22‑23 s) because the bottleneck shifted to PartialAgg. Enabling high_cardinality_agg=true caused a timeout, confirming that the optimization is unsuitable for low‑cardinality workloads.
Conclusions & Recommendations
For massive high‑cardinality GroupBy workloads, users should:
Enable enhanced SQL mode.
Adjust hash_partition_count within 20‑64 (higher values may help but yield diminishing returns beyond 200).
Turn on high_cardinality_agg=true when the distinct‑value count is very large; avoid it for low‑cardinality scenarios.
When data volume is modest, the default mode is sufficient. When data is huge but sharding is limited, increasing parallelism via the session parameters can deliver multi‑second speedups without sacrificing accuracy.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
