Databases 15 min read

How JD.com Scales HBase to 90PB: Architecture, Optimizations, and Lessons

This article examines JD.com's massive HBase deployment, detailing its evolution from early adoption to a 90PB, 7,000‑node cluster, the platform's architecture, multi‑active disaster recovery, multi‑tenant isolation, and the integration of Phoenix for SQL‑based access, offering practical insights for large‑scale distributed storage.

dbaplus Community
dbaplus Community
dbaplus Community
How JD.com Scales HBase to 90PB: Architecture, Optimizations, and Lessons

1. Introduction to HBase

As digital transformation accelerates, traditional relational databases struggle to meet the demands of massive, distributed data storage. HBase, an open‑source, column‑oriented database inspired by Google’s Bigtable, provides high reliability and scalability for large‑scale workloads.

2. HBase at JD.com

JD.com leverages HBase across more than 700 business systems, supporting real‑time queries for e‑commerce, AI, finance, logistics, and monitoring. Typical use cases include product reviews, personalized recommendations, order processing, fraud detection, and sales forecasting, handling millions of queries per second.

Commerce: product reviews, member services, personalized recommendations, user profiling, POP orders, merchant marketing, instant messaging.

AI: intelligent customer service, image recognition, facial access control.

Finance: risk control, credit services, asset management.

Logistics: order tracking, warehouse management, sales forecasting.

Monitoring: unified, server, container, big‑data, and dashboard monitoring.

By 2018 JD.com’s HBase cluster grew to over 7,000 nodes with a storage capacity of 90 PB, supporting more than 700 business systems.

3. Core Application Scenarios and Characteristics

JD.com classifies HBase usage into three primary scenarios:

3.1 Ultra‑large‑scale millisecond‑level read

Batch queries and aggregation reports for millions of merchants require direct reads from HBase, bypassing caches to deliver real‑time analytics.

3.2 T+1 reporting and data storage

Massive nightly data processing pipelines write to HBase, with the write peak occurring between 01:00–05:00 AM.

3.3 Real‑time ingestion and updates

Producer‑consumer pipelines ingest data into HBase with millisecond‑level latency, and downstream services consume the fresh data immediately.

Ensuring stability, low latency, and tenant isolation under these conditions is a central challenge for the JD HBase team.

4. Platform Evolution and Architecture

Early HBase versions suffered from poor usability, limited documentation, and operational complexity. With the release of HBase 2.x, JD.com introduced a layered architecture:

Storage layer: Separate deployment of HDFS and HBase, container‑based scaling.

Kernel layer: Modified RegionServer to auto‑tune performance based on hardware.

Middleware layer: Services for disaster recovery, data governance, cluster grouping, quota & rate‑limiting, and multi‑language support.

User layer: Various data loading methods and query engines.

Images illustrate the logical stack and physical deployment.

JD HBase platform architecture
JD HBase platform architecture

5. Multi‑Active Disaster Recovery

JD.com implements asynchronous master‑slave replication to guarantee data safety. A policy‑driven switch mechanism provides transparent failover, supporting manual, automatic, and forced strategies at cluster, namespace, and table levels. The PolicyServer stores policies in MySQL and can be horizontally scaled.

Disaster recovery architecture
Disaster recovery architecture

6. Multi‑Tenant Resource Isolation

To avoid resource contention, JD.com introduced HBase 2.0 region‑server grouping, dynamically partitioning a cluster into multiple physical groups. This enables:

Physical isolation of tenants.

Dynamic scaling during peak events (e.g., 618, 11.11).

Placement of high‑performance workloads on high‑spec groups.

Rate‑limiting and quota features prevent noisy‑neighbor problems, with alerts at user, table, and namespace levels.

Resource isolation diagram
Resource isolation diagram

7. Phoenix SQL & OLTP Integration

Native HBase only supports key‑value and range scans. JD.com adopted the open‑source Phoenix layer to provide SQL access, adding security, multi‑tenant support, and performance optimizations. Phoenix QueryServers are load‑balanced via Nginx.

Example Java code to query Phoenix:

import java.sql.*;
public class Demo {
    public static void main(String[] args) throws Exception {
        Connection conn = DriverManager.getConnection(
            "jdbc:phoenix:thin:url=http://q.sql.jd.com:2001;serialization=PROTOBUF", "wuyiran", "jdpassword");
        PreparedStatement stmt = conn.prepareStatement("SELECT COUNT(*) FROM zsc.proxy");
        ResultSet rs = stmt.executeQuery();
        while (rs.next()) {
            System.out.println(rs.getString(1));
        }
    }
}

8. Conclusion

Over several years, JD.com’s HBase platform has evolved from a bare‑metal deployment to a sophisticated, multi‑tenant, disaster‑recovery‑enabled service supporting diverse data types and workloads. Ongoing improvements—such as region grouping, quota management, and Phoenix integration—demonstrate a practical roadmap for operating ultra‑large HBase clusters at scale.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataDatabase ArchitectureHBasedistributed storageJD.comPhoenix SQL
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.