Big Data 11 min read

How HBase Boosted Tencent Monitoring Platform Performance 3‑5×

Facing the challenge of storing over 120 billion daily monitoring points from hundreds of thousands of servers, Tencent’s monitoring platform migrated from a custom solution and OpenTSDB to a finely tuned HBase architecture, achieving 3‑5× higher throughput, improved reliability, and significant storage savings.

21CTO
21CTO
21CTO
How HBase Boosted Tencent Monitoring Platform Performance 3‑5×

Introduction

Company operates hundreds of thousands of servers, and the Tencent Monitoring Platform (TMP) collects more than 1.2 trillion monitoring data points per day. This article examines the problems of the existing storage architecture and describes the practice of using HBase to store TMP monitoring data.

Background

Open‑source big‑data processing systems have matured, offering solutions for many scenarios, similar to MySQL’s role for relational data. TMP gathers minute‑level data from massive server fleets, initially trying OpenTSDB before designing a custom HBase storage solution.

Analysis of TMP Current Storage Architecture

The current architecture routes data from agents through collectors, stores it in memory caches, and periodically dumps it to the file system. While simple, horizontally scalable, and fully self‑developed, it suffers from cache failures, disk/machine failures, fixed data format and lack of compression, and reliance on external metadata services.

Advantages of HBase Storage Engine

HBase, a distributed column‑store based on the Bigtable model and LSM‑Tree engine, is widely used for massive time‑series data. Its advantages include high reliability and availability, high write performance, natural horizontal scalability, and column compression that eliminates empty columns.

OpenTSDB Attempt and Bottleneck Analysis

When testing OpenTSDB on HBase, the cluster became overloaded at about 700 k writes per second. Bottlenecks included heavy UID translation, inefficient append and compaction mechanisms, and a single table design that hindered time‑based maintenance and region management.

TMP Monitoring Storage Design Practice

Region Pre‑Splitting

To avoid hotspot regions, tables are pre‑split into 100 regions using split keys 0x01‑0x63, distributing data evenly across RegionServers.

Rowkey and Column Design

Rowkey consists of a 1‑byte salt, 4‑byte server ID, 4‑byte timestamp, and 4‑byte metric ID, ensuring uniform distribution and efficient queries. A single column family is used to reduce Memstore overhead, and qualifiers store time offsets.

Column‑Based Compaction

HBase stores data column‑wise; each column repeats rowkey and column family information, leading to storage bloat. Inspired by OpenTSDB, columns within the same time‑base are merged into a single column, reducing storage by about 90%.

HBase Performance Tuning

Key tuning points include increasing RegionServer heap and Memstore sizes, enabling Snappy compression, raising compaction thread counts (e.g., hbase.regionserver.thread.compaction.small/large = 5), and proper GC configuration to avoid stop‑the‑world pauses.

Conclusion

After the redesign, TMP’s monitoring storage achieves 3‑5× higher performance, with peak write rates of 4 million rows per second on eight RegionServers, far surpassing OpenTSDB’s 700 k limit. The system is now in production, and future work includes adding a buffering layer for pre‑compaction to further boost performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringHBasePerformanceTuningbigdataTimeSeriesDistributedStorage
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.