Databases 16 min read

HBase Optimization Practice in Vivo's Unified Content Platform

Vivo's unified content platform replaced its unwieldy 60 TB MongoDB store with HBase, then upgraded the cluster, introduced table‑specific connection pools, column‑only reads, tuned compaction, and leveraged multi‑version cells, cutting response times from seconds to under ten milliseconds and dramatically lowering operational costs while boosting read/write performance.

vivo Internet Technology

Jul 10, 2024

HBase Optimization Practice in Vivo's Unified Content Platform

This article introduces HBase optimization practices implemented in vivo's unified content platform, which handles core functions including content review, content understanding, content creation, and content distribution.

Business Background: As a content middleware platform, it stores massive amounts of text, images, and video content daily. The platform processes classification labels, review information, and other processing data. Read and write operations are frequent, serving video business and pan-information flow services.

Problems with Previous MongoDB Solution: Core data reached over 20TB with total storage exceeding 60TB. MongoDB's storage architecture couldn't meet scalability requirements. High query traffic from smart push, pan-information flow, and video recommendation systems demanded high performance. Regular maintenance required switching MongoDB primary-replica nodes and rebuilding instances, resulting in high operational costs.

HBase Selection Reasons: HBase provides Key/Value columnar storage with millisecond-level read/write performance. Built on Hadoop's HDFS, it offers high scalability and fault tolerance through replication mechanisms. It ensures strong consistency with Write-ahead log (WAL) for data durability. Additionally, HBase supports multiple versions per column, allowing flexible version control.

Optimization Practices:

4.1 Cluster Upgrade: Upgraded from HBase 1.2 to 2.4.8 to resolve issues like frequent RIT problems, request latency spikes, slow table creation/deletion, meta table instability, and slow node failure recovery. After upgrade, average response time dropped from occasional spikes exceeding 10s to consistently below 10ms.

4.2 Connection Pool and Pre-warming: Created connection pools for different tables using Apache Commons Pool's GenericObjectPool. This provides resource isolation between tables, connection reuse to reduce network overhead, and traffic smoothing to handle sudden spikes. Implemented pre-loading during application startup to avoid performance degradation from mass connection creation.

4.3 Column-based Reading: Used HBase's Get class methods (addFamily, addColumn, setTimeRange, setMaxVersions) to read only required columns instead of all fields. For tables with hundreds of fields or large vector fields, this eliminated over half of unnecessary field returns and improved average response time.

4.4 Compact Optimization: Configured compaction throughput parameters (hbase.hstore.compaction.throughput.higher.bound and lower.bound) to throttle compaction operations. Major compactions are only executed during off-peak hours. This reduced compaction duration by over 70% while maintaining read performance.

4.5 Field-level Version Management: Explored HBase's multi-version capability to store multiple versions of cell values. Can configure version retention by count or time dimension. Useful for scenarios requiring temporal data retrieval and can ensure consumption order in asynchronous update scenarios via message timestamps as version numbers.

Results: After optimization, both read and write performance significantly improved, ensuring business stability while greatly reducing operational costs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed database HBase database optimization NoSQL Columnar Storage Compaction Optimization HBase Performance Tuning

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.