How VictoriaMetrics' Distributed Architecture Scales Massive Time‑Series Data
VictoriaMetrics employs a modular, horizontally scalable architecture composed of vmagent, vminsert, vmstorage, vmselect, and vmalert, each handling data collection, ingestion, storage, querying, and alerting, while leveraging consistent hashing, LSM‑tree storage, TSID indexing, and multi‑tenant isolation to efficiently manage large‑scale time‑series workloads.
1. VictoriaMetrics Distributed Architecture
2. Component Overview
vmagent
vmagent is a high‑performance data collection component that gathers metrics from various sources (e.g., Prometheus exporters) and forwards them via remote write protocol to VictoriaMetrics or compatible storage systems.
vminsert
vminsert is a stateless ingestion entry point that hashes time‑series labels to route metrics to appropriate vmstorage nodes, enabling horizontal scaling and load‑balanced writes.
Distributed routing : Consistent‑hashing ensures the same time series always maps to the same vmstorage node, avoiding duplication.
Horizontal scaling : Adding vminsert instances increases write throughput for high‑concurrency ingestion.
vmstorage
vmstorage is the core storage component of a VictoriaMetrics cluster, responsible for persisting and quickly retrieving time‑series data.
Storage engine optimization : Uses an LSM‑tree with columnar layout for efficient writes, compression, and low disk usage.
Horizontal scaling : Nodes can be added dynamically; data is sharded, and single‑node failures do not affect overall availability.
Query optimization : TSID (TimeSeries ID) enables millisecond‑level response times for massive queries.
vmselect
vmselect is a distributed query engine that parses user queries (e.g., PromQL), aggregates data from multiple vmstorage nodes, and returns the result.
vmalert
vmalert is an alerting component compatible with Prometheus alert rules; it periodically reads data from VictoriaMetrics and triggers alerts.
3. vminsert Mechanics
vminsert hashes the metric raw name to decide the target vmstorage node, buffers data, batches and compresses it before sending.
The component maintains the state of each vmstorage node (Ready, Overloaded, Broken, Readonly) to decide when to reroute data.
Ready: Node is healthy and has buffer space.
Overloaded: Buffer exceeds 30 KB of unsent data.
Broken: Temporary issues (network, concurrency limits, etc.).
Readonly: Disk space shortage; node accepts no new data.
Prolonged Overloaded, Broken, or Readonly states trigger rerouting, which can increase resource consumption.
Current Community Distributed Issues
Consistent hashing routes the same series to a fixed vmstorage node, but rerouting disrupts this mapping, requiring registration of new series and adding significant overhead.
4. vmstorage Mechanics
Key Concepts
Time series and samples.
What is TSID?
Metric name raw is the byte[] of metric name plus sorted labels (Canonical Name). TSID (TimeSeries ID) uniquely identifies a time series based on this canonical name.
<code>type TSID struct {
AccountID uint32
ProjectID uint32
MetricGroupID uint64
JobID uint32
InstanceID uint32
MetricID uint64
}
</code>MetricID Generation
MetricID starts from the nanosecond timestamp at service start and increments atomically.
<code>func generateTSID(dst *TSID, mn *MetricName) {
dst.AccountID = mn.AccountID
dst.ProjectID = mn.ProjectID
dst.MetricGroupID = xxhash.Sum64(mn.MetricGroup)
if len(mn.Tags) > 0 {
dst.JobID = uint32(xxhash.Sum64(mn.Tags[0].Value))
}
if len(mn.Tags) > 1 {
dst.InstanceID = uint32(xxhash.Sum64(mn.Tags[1].Value))
}
dst.MetricID = generateUniqueMetricID()
}
var nextUniqueMetricID = uint64(time.Now().UnixNano())
func generateUniqueMetricID() uint64 {
return atomic.AddUint64(&nextUniqueMetricID, 1)
}
</code>What is indexdb? What is an index?
indexdb stores metadata indexes for time series.
Daily index: Queries covering up to 40 days.
Global index: For longer ranges.
Global Index Mappings
Tag → metric IDs, metric ID → TSID, metric ID → metric name, deleted metric IDs.
Per‑day Index Mappings
Date → metric ID, Date + tag → metric ID, Date + metric name → TSID.
Write Path
Data is hashed by TSID, distributed across shards, grouped and merged in memory, then flushed to on‑disk parts composed of blocks.
In‑memory grouping and merging occur before persisting to disk.
Read Path
Example: query http_request_total{status="500"} for 2024‑01‑01 and 2024‑01‑02.
5. vmselect Mechanics
vmselect sends user queries to all vmstorage nodes (MPP), merges and deduplicates results, and returns them.
Distributed query : Parallel queries to all vmstorage nodes with result merging.
Cache mechanism : Built‑in rollup result cache for common aggregations (sum, avg) reduces recomputation.
Query optimization : Push‑down filters to storage layer to minimize data transfer.
Multi‑tenant support : AccountID provides tenant‑level query isolation, ensuring data security in shared clusters.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.