Databases 24 min read

How HiTSDB’s New Streaming Aggregation Engine Boosts Query Speed 10×

This article examines the architectural redesign of Alibaba's High‑Performance Time Series Database (HiTSDB), covering storage model changes, inverted‑index enhancements, a pipelined streaming aggregation engine, data‑migration strategies, and performance benchmarks that together deliver over tenfold query speed improvements.

Alibaba Cloud Developer

Apr 19, 2018

How HiTSDB’s New Streaming Aggregation Engine Boosts Query Speed 10×

Introduction

HiTSDB (High‑Performance Time Series Database) is a low‑cost, highly reliable online time‑series service used extensively within Alibaba for monitoring and IoT workloads, supporting massive traffic such as the 2016 and 2017 Double‑11 events.

Background

While HiTSDB’s original engine was heavily optimized for internal Alibaba services, many of these optimizations proved difficult to apply in a public‑cloud environment. Aggregation queries often caused stack overflows, OOM errors, or complete query stalls due to architectural flaws in the original aggregation engine.

Upgrade Overview

The development team decided to revamp the engine around five key areas: storage model redesign, index upgrades, a brand‑new streaming aggregation engine, data migration, and performance evaluation. The focus of this article is the streaming aggregation component.

1. Time‑Series Storage Model

Typical time‑series data consists of a timestamp dimension and a “timeline” dimension (metric + tags). Two common storage strategies exist:

Partition by time‑window, storing consecutive points of the same natural window adjacently (used by InfluxDB, Prometheus, etc.). This approach suffers from out‑of‑order handling and can cause write performance penalties for late data.

Partition by timeline, placing all points of the same metric‑tag combination together. HiTSDB adopts this method, storing rows in HBase with a row‑key composed of metric, tags, and a natural window. This yields very efficient low‑dimensional queries, while higher‑dimensional queries use a streaming scan.

Hot‑spot problems arise when high‑frequency metrics (e.g., cpu.usage collected every second) concentrate writes on a few storage shards, causing severe data skew. Bucket‑based sharding (splitting by metric, business name, and host) mitigates this issue.

2. Inverted Index

Multi‑dimensional timelines require an inverted index to support fast queries. Posting lists can become extremely long for high‑cardinality tags, leading to memory bloat. Compression (delta encoding) and string dictionary techniques are employed to reduce memory usage.

3. Streaming Aggregation Engine

The original engine used a materialization execution mode that read all relevant time‑series into memory before aggregation, causing heap explosions for large time ranges. It also tightly coupled query, filter, interpolation, and aggregation logic, making extensions difficult.

3.1 New Execution Model

A pipeline (Volcano/Iterator) execution model is introduced. Queries are broken into a DAG of operators, each implementing Open, Next, and Close. Operators include:

ScanOp – asynchronous HBase reads.

DsAggOp – down‑sampling and interpolation.

AggOp – group aggregation (PipeAggOp, MTAggOp).

RateOp – compute rate of change.

This design enables clear interfaces, independent optimization of operators, and easy addition of new operators (e.g., field‑value filters).

3.2 Operator Strategies

Two main aggregation operators are used:

PipeAggOp : for queries that do not require interpolation (down‑sampled with non‑null fill) and where aggregation functions support incremental computation (sum, count, avg, min, max). It processes batches of time‑series in a streaming fashion, keeping only per‑group aggregates in memory.

MTAggOp / GroupedAggOp : for queries that need interpolation. MT‑Agg falls back to materialization, while GroupedAggOp leverages pre‑sorted tag groups to limit materialized data to a single group at a time, reducing memory pressure.

3.3 Query Optimizer and Executor

The optimizer selects the appropriate operator chain based on query semantics, aiming to minimize memory consumption and maximize throughput.

4. Data Migration

The new engine stores data in a format incompatible with older versions. A migration tool converts existing points during a hot upgrade. The architecture uses concurrent “salt” partitions, each with a producer (HBase scanner) and consumer (writer). Flow control limits queue size to avoid memory pressure. A ZooKeeper‑based DataLock ensures only one node performs migration at a time. Migration is idempotent: a flag data_conversion_completed is written after successful conversion, and periodic runs exit early if the flag exists.

5. Performance Evaluation

Migration throughput reaches up to 200 k TPS (≈10 GB/hour). Query latency tests show the new engine reduces average response time from ~190 ms to ~13 ms for a 1 k‑line, 1‑hour window workload, and from ~1.8 s to ~0.18 s for a single‑line, 10‑hour window workload—an improvement of more than tenfold.

Conclusion

The redesign of HiTSDB’s aggregation engine, storage model, and indexing dramatically improves stability, write/read performance, and extensibility, positioning HiTSDB for broader commercial use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Performance Optimization time_series_database inverted index HiTSDB Streaming Aggregation

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.