Big Data 10 min read

Alibaba’s HBase Innovations: Powering Big Data at Scale – HBaseCon 2017 Asia Insights

At HBaseCon 2017 Asia, Alibaba showcased a series of groundbreaking HBase enhancements—including strong synchronous replication, SQL-on-HBase capabilities, cross‑cluster range data copy, and read/write path optimizations—that dramatically improve performance, reliability, and usability for large‑scale big‑data storage.

Alibaba Cloud Developer

Aug 10, 2017

Alibaba’s HBase Innovations: Powering Big Data at Scale – HBaseCon 2017 Asia Insights

HBaseCon is the official Apache HBase conference, created in 2012 to share and discuss the use, development, and evolution of the open‑source distributed big‑data storage system. The 2017 edition was the first held in Asia, taking place in Shenzhen, China, highlighting HBase’s popularity and the strong contributions of Chinese developers.

Alibaba’s HBase Experience

Alibaba has been using Apache HBase since 2010. After nearly seven years, more than 1,000 services rely on HBase, with clusters spanning tens of thousands of nodes and storing petabytes of data. Alibaba continuously contributes back to the community, adding features such as Bucket Cache and Reverse Scan, and nurturing two PMC members and two Committers.

1. Strong Synchronous Replication

Traditional HBase master‑slave clusters use asynchronous replication, causing brief data inconsistencies and forcing users to sacrifice strong consistency for disaster recovery. Alibaba’s HBase expert Tianyin presented a strong synchronous replication solution that employs concurrent writes on master and standby nodes combined with RemoteLog technology. In intra‑city networks, this approach incurs only about a 2% throughput reduction compared to asynchronous replication. The system also maintains an asynchronous path for normal operation and can replay only the missing RemoteLog entries during a master failure, achieving a few‑second data lag.

The solution received strong positive feedback from the audience, who expressed eagerness to adopt the feature once it is merged back into the open‑source project.

2. SQL on HBase

Many HBase users come from traditional SQL databases and find HBase’s row‑key design and API unfamiliar. To lower the entry barrier, Alibaba introduced a SQL layer on top of HBase. Senior HBase engineer Tianmu demonstrated that, after optimization, SQL query performance is comparable to native API access. The SQL engine also creatively supports HBase‑specific features such as multi‑version data and timestamps.

In addition, Alibaba added both global and local secondary indexes, enabling multi‑column indexing, simplifying data model design, improving request efficiency, and reducing usage costs.

3. Cross‑Cluster Range Data Copy

Large‑scale HBase deployments often need to migrate data due to business growth or data‑center relocation. Common scenarios include full‑cluster migration, incremental synchronization between data centers, and selective data recovery. Existing backup/restore tools lack efficient handling of these cases.

Alibaba developed the “Range Data Copy” feature built into HBase, providing a simple, high‑performance, fault‑tolerant data copy mechanism. Using this feature, a 200 TB table can be copied to another cluster in under five hours.

4. Read/Write Path Optimizations

Alibaba has made extensive read/write performance improvements to HBase. The following key techniques were presented by PMC Jueding and Committer Tiantian:

Replacing the native RPC server with Netty, greatly increasing RPC throughput and reducing latency.

Introducing a new HFileBlock encoding format that turns sequential search into binary search, boosting random read performance.

Splitting the write path to free blocked handler resources, enhancing write throughput.

These optimizations have already been contributed back to the open‑source community, allowing all HBase users to benefit from the performance gains.

Conclusion

Beyond Alibaba’s contributions, other companies also shared valuable experiences: Xiaomi implemented an AsyncClient to fill the lack of native asynchronous APIs; Zhihu leveraged Kubernetes for automatic HBase cluster scaling; and FiberHome isolated read/write resources to stabilize near‑line queries. The conference featured a round‑table with Apache HBase “master” Michael Stack, fostering discussion on HBase’s current state and future direction.

The enthusiastic response to HBaseCon Asia demonstrates the strong demand for advanced, reliable, and high‑performance big‑data storage solutions in China and worldwide.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Big Data SQL HBase replication Distributed storage

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.