Big Data 11 min read

How to Speed Up High‑Cardinality GroupBy Queries by Up to 8× in SLS

This article explains why high‑cardinality GroupBy queries are slow, describes SLS's underlying aggregation pipeline, and shows how adjusting session parameters and enabling high‑cardinality optimizations can reduce query times from dozens of seconds to just a few seconds across three real‑world test scenarios.

Alibaba Cloud Native

Sep 4, 2024

How to Speed Up High‑Cardinality GroupBy Queries by Up to 8× in SLS

When analyzing data with an extremely large number of distinct values (high cardinality), traditional GroupBy operations can become a performance bottleneck, especially in operational analytics such as e‑commerce sales distribution, game player behavior tracking, or market trend monitoring.

Why High‑Cardinality GroupBy Is Slow

In most OLAP engines, data is hashed and distributed across nodes for parallel aggregation. The pipeline typically follows four stages: DataSource → PartialAgg → FinalAgg → Output. While PartialAgg and DataSource run together, FinalAgg is limited by the number of shards (default max 20), causing severe slowdown when the distinct key count reaches hundreds of millions or billions.

Implementation Details in SLS

SLS (Log Service) adopts the same distributed hash‑based GroupBy but adds two tunable session parameters: hash_partition_count – controls the parallelism of the FinalAgg stage (default max 20, can be increased up to 200). high_cardinality_agg – switches to a special aggregation path optimized for extremely high‑cardinality datasets.

Test Setup

Three test cases were built using simulated Nginx access logs stored in a SLS Logstore with 5000 CU allocation.

{
  RequestId: varchar, /* each request ID is globally unique */
  ClientIP: varchar,
  Method: varchar,
  Latency: int,
  Status: int,
  ...
}

The cases cover:

High‑cardinality single‑column aggregation: 2.8 billion rows grouped by RequestId (≈2.8 billion distinct values).

High‑cardinality multi‑column aggregation: 4.5 billion rows grouped by ClientIP, Status, Latency (≈1.5 billion distinct combinations).

Low‑cardinality value aggregation: 1.5 trillion rows grouped by Latency (≈7.35 million distinct values) to compute Top‑100 frequencies.

Performance Results

Case 1 – Single‑Column High Cardinality

Baseline (plain SQL) took ~17 s. Enabling enhanced SQL with hash_partition_count=40 reduced it to 10 s. Raising the count to 64/128/200 further cut the time to 7 s, 4.5 s, and 3.7 s respectively. Finally, turning on high_cardinality_agg=true brought the query down to ~2.1 s, an 8× speedup.

Case 2 – Multi‑Column High Cardinality

Baseline took ~24 s. With hash_partition_count=40 it fell to 11 s; increasing the count to 64/128/200 yielded 7.3 s, 5.9 s, and 5.8 s. Enabling high_cardinality_agg=true reduced the time to ~2.9 s, again achieving roughly an 8× improvement.

Case 3 – Low Cardinality Aggregation

Baseline query finished in 4.3 s but was truncated due to data size. Enhanced SQL produced an accurate result in 23.4 s. Raising hash_partition_count had little effect (22‑23 s) because the bottleneck shifted to PartialAgg. Enabling high_cardinality_agg=true caused a timeout, confirming that the optimization is unsuitable for low‑cardinality workloads.

Conclusions & Recommendations

For massive high‑cardinality GroupBy workloads, users should:

Enable enhanced SQL mode.

Adjust hash_partition_count within 20‑64 (higher values may help but yield diminishing returns beyond 200).

Turn on high_cardinality_agg=true when the distinct‑value count is very large; avoid it for low‑cardinality scenarios.

When data volume is modest, the default mode is sufficient. When data is huge but sharding is limited, increasing parallelism via the session parameters can deliver multi‑second speedups without sacrificing accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native SQL SLS big-data groupby high-cardinality

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Why High‑Cardinality GroupBy Is Slow

Implementation Details in SLS

Test Setup

Performance Results

Case 1 – Single‑Column High Cardinality

Case 2 – Multi‑Column High Cardinality

Case 3 – Low Cardinality Aggregation

Conclusions & Recommendations

Alibaba Cloud Native

How this landed with the community

Was this worth your time?

0 Comments

Case 1 – Single‑Column High Cardinality

Case 2 – Multi‑Column High Cardinality

Case 3 – Low Cardinality Aggregation