Dashboard Metrics Storage Upgrade: Replacing HBase with VictoriaMetrics and ClickHouse
This article details the redesign of Ctrip's internal Dashboard monitoring system, explaining why the original HBase‑based TSDB was replaced, the new architecture using VictoriaMetrics and ClickHouse, component upgrades, unified query layer, performance gains, and future roadmap for metrics handling.
The Dashboard product is a long‑standing Ctrip internal monitoring solution that provides enterprise‑level metrics collection, storage, and visualization. Originally it stored time‑series data in HBase, which caused query latency, hotspot issues, heavy operational overhead, and incompatibility with Prometheus.
To meet the All‑in‑One monitoring requirements, the team decided to replace the HBase storage and upgrade the core components while keeping the user‑facing gateway and agent unchanged, ensuring a transparent migration.
Overall Architecture
The system consists of six components: dashboard‑engine, dashboard‑gateway, dashboard‑writer, dashboard‑HBase, dashboard‑collector, and dashboard‑agent, handling up to 600 million rows per minute.
Problems with the HBase solution
Slow TSDB‑style queries compared to dedicated TSDBs.
HBase hotspot and write‑performance issues.
Heavy operational burden of the HBase stack.
Proprietary protocol not compatible with Prometheus.
Replacement Strategy
The migration focuses on dashboard‑writer and dashboard‑HBase. The new storage combines VictoriaMetrics (a Prometheus‑compatible TSDB) for high‑cardinality series and ClickHouse for metadata and low‑cardinality log‑type metrics.
Dashboard‑HBase → dashboard‑vm
Storage is switched to a VictoriaMetrics + ClickHouse hybrid:
CREATE TABLE hickwall.downsample_mtv (
`timestamp` DateTime,
`metricName` String,
`tagKey` String,
`tagValue` String,
`datasourceId` UInt8 DEFAULT 40
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/hickwall_cluster-{shard}/downsample_mtv', '{replica}')
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (timestamp, metricName, tagKey)
TTL timestamp + toIntervalDay(7)
SETTINGS index_granularity = 8192and a distributed table:
CREATE TABLE hickwall.downsample_mtv__dt (
`timestamp` DateTime,
`metricName` String,
`tagKey` String,
`tagValue` String,
`datasourceId` UInt8 DEFAULT 40
) ENGINE = Distributed(hickwall_cluster, hickwall, downsample_mtv, rand())ClickHouse also stores a small amount of log‑type data that does not fit VictoriaMetrics.
Dashboard‑writer → dashboard‑vmwriter
Data flow changes to Kafka → processing → storage. New features include:
Metadata extraction to ClickHouse (using Redis for real‑time writes).
Pre‑aggregation based on configurable dimensions (e.g., ClusterName, appid).
Data governance using HyperLogLog for cardinality checks and Redis caches for tag‑value limits.
High‑performance multi‑threaded ingestion with bucketed hashing.
private int computeMetricNameHash(byte[] metricName) {
int hash = Arrays.hashCode(metricName);
hash = (hash == Integer.MIN_VALUE ? 0 : hash);
return hash;
}
byte[] metricName = metricEvent.getName();
hash = computeMetricNameHash(metricName);
buckets[Math.abs(hash) % bucketCount].add(metricEvent);The ingestion latency is typically under 1 second.
Unified Metrics Query Layer
The new query API is compatible with the original Dashboard protocol and the Prometheus protocol, providing four core endpoints: Data, Measurement, Measurement‑tagKey, and Measurement‑tagKey‑tagValue, backed by VictoriaMetrics for series data and ClickHouse for metadata.
Results
Query latency improved ~4×, most queries now complete in 10‑50 ms.
Write stability resolved HBase hotspot issues.
Support for PromQL enables advanced calculations, comparisons, and fuzzy matching.
Future Plans
Extend the unified query layer to all internal metrics sources (e.g., HickWall, Cat).
Develop a unified ingestion layer for billions of metrics per second.
Apply the same unified approach to log storage.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.