Databases 28 min read

Alibaba’s HBase Scaling Secrets: High‑Availability, Replication, Performance

This article details how Alibaba has evolved HBase from an internal storage solution to a cloud service, covering its architecture, high‑availability design, asynchronous and synchronous replication, multi‑link data flows, cost‑effective redundancy, performance optimizations, and future development directions.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba’s HBase Scaling Secrets: High‑Availability, Replication, Performance

Introduction

In 2011 Hadoop was popular at Alibaba, and HBase was introduced to store massive transaction data. After five years HBase matured into a widely used internal storage product and was offered as Alibaba Cloud HBase service for PB‑scale, high‑throughput workloads.

Overview

HBase is an open‑source NoSQL distributed database based on Google’s BigTable, providing high reliability, performance and scalability on cheap servers. It became an Apache top‑level project in 2010 and is used by many large companies.

Its capabilities enable online access to massive structured data, real‑time analytics and large object storage.

HBase at Alibaba

Ali‑HBase serves Taobao, Tmall, Ant Financial, Cainiao, Alibaba Cloud, Gaode, Youku and other services, handling hundreds of GB/s of read and write traffic during Double‑11 2016.

High‑Availability Construction

Alibaba measures availability with SLA (e.g., 99.999% uptime means less than 5.25 minutes of downtime per year) and achieves HA by deploying clusters across multiple data centers with asynchronous replication.

Cluster Asynchronous Replication

Since HBase 0.92, replication pushes incremental data to a backup cluster. Alibaba improved source‑side sending (multi‑threaded), target‑side sink (sorted writes), hotspot handling, online configuration, multi‑link routing, loop detection, and link isolation.

Data Consistency

For strong‑consistency scenarios Alibaba uses two approaches: (1) pause writes on the primary, wait for full replication, then switch traffic; (2) synchronous replication where writes are committed to both primary and standby logs before acknowledging the client.

Synchronous Replication

The client writes two logs (local and remote). Both must succeed for the write to succeed. Remote logs are used only after failover, and can be cleared once fully replicated. This provides strong consistency with less than 10 % latency increase.

Data Transfer Pipeline – HExporter

HExporter pushes data from HBase to downstream systems in real time, guaranteeing accuracy, high throughput, disaster recovery, time‑deterministic delivery, degradable per‑table export, and monitoring.

Cost and Performance Optimizations

Alibaba reduced replication cost by lowering HDFS replica count and implementing cross‑cluster partition copy, achieving up to 70 MB/s per node without throttling. Additional performance work includes asynchronous API, prefix BloomFilter for scans, HLog compression, and coprocessor‑based in‑place computation.

Future Directions

Upcoming work focuses on GC pause reduction, SQL access, and containerized deployment of HBase.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

HBaseReplicationdistributed storage
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.