How MetricStore 2.0 Redefines Cloud‑Native Time‑Series Storage Performance
MetricStore 2.0 introduces a comprehensive overhaul of memory, file, compute, and transport layers for cloud‑native time‑series data, delivering higher compression, lower latency, multi‑tenant resource control, and support for dynamic schemas, while addressing the scalability limits of its 1.0 predecessor.
Background
Metrics are the most frequently used data type in observability, serving as the first step for monitoring system health across infrastructure, cloud‑native, middleware, IoT, and business scenarios.
Recent trends such as finer granularity, shorter intervals, more dynamic workloads, shift from view‑only to analysis, and automation have driven new requirements for time‑series engines.
Observation granularity becomes 10‑100× finer.
Intervals shrink from minutes to seconds, increasing data volume up to 12×.
Dynamic workloads (containers, serverless, training) shorten instance lifetimes.
Queries evolve from point lookups to multi‑dimensional aggregations.
Automation and AIOps increase query pressure.
These trends define the capabilities needed for the next‑generation engine.
MetricStore 1.0 Limitations
Deployed in 2020, MetricStore 1.0 handles tens of PB daily but faces bottlenecks:
Low recent‑data compression leads to frequent cache eviction.
Label encoding limits compression and query performance.
Columnar storage does not exploit time‑line ordering, causing sorting overhead and poor compression.
MetricStore 2.0 Technical Solution
Version 2.0 upgrades storage and compute to become Alibaba Cloud’s next‑gen observability engine.
Memory Storage Model Upgrade
Real‑time in‑memory compression balances performance and compression, leveraging the regularity of time‑lines and label repetition. A two‑level dictionary encodes label pairs, achieving up to 10× compression for container metrics. Gorilla‑based compression with SIMD acceleration handles value columns.
File Storage Model Upgrade
Meta data uses the same two‑level dictionary; small 16 KB dictionary segments are stored on‑disk, enabling selective loading. Time and value columns are stored as "Piece" units, preserving ordering while improving compression. Multiple algorithms (Bitmap, RLE, BitPacking, XOR, dictionary, Zstd) are chosen dynamically based on type and sparsity.
Compute Engine Upgrade
A new C++ Prometheus engine replaces the Go implementation, eliminating redundant allocations and enabling parallel, stream‑based execution. SIMD accelerates function evaluation, and binary operators run concurrently across goroutines.
100 * ( ( (count(aliyun_prometheus_agent_heartbeat{agentId="0"}) or vector(0)) > bool 0 ) * ( (count(increase(aliyun_prometheus_agent_write_succeed_batch_total{}[4m])) == bool (max(aliyun_prometheus_agent_replica_current_num) - 1)) == 1 or (max(aliyun_prometheus_agent_replica_current_num) == bool 1) or vector(0) ) ) * ... )PromQL compatibility remains 100%.
Transport Protocol Upgrade
Queries are split into IO and compute, streamed in chunks, and only required label fields are transmitted. High‑compression blocks are sent directly to compute nodes, reducing serialization overhead.
Performance Indicators
Write throughput reaches 130 M/s per shard (≈250 k rows/s), three times 1.0. Time column compresses at 430 M/s and decompresses at 870 M/s; Long/Double columns compress at 940 MB/s and decompress at 2.5 GB/s.
Query latency across typical scenarios consistently outperforms VictoriaMetrics, with multi‑second reductions in short‑cycle alerting and analysis workloads.
Resource usage drops to ~50% of VictoriaMetrics in container workloads, while QPS capacity increases threefold over 1.0.
Summary
Through efficient encoding, adaptive memory caching, and a fully C++ stack, MetricStore 2.0 delivers superior performance, lower cost, and higher resource utilization than open‑source alternatives, while supporting flexible schemas, multiple query languages (PromQL, SQL, SPL), and multi‑tenant isolation.
Future Plans
Dynamic multi‑value columns for highly variable workloads.
Built‑in down‑sampling during compaction.
Scale‑out read capability for massive QPS.
Safe data deletion mechanisms with strict controls.
MetricStore 2.0 is rolling out across regions; users are invited to join the community for feedback.
Alibaba Cloud Observability
Driving continuous progress in observability technology!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
