How JD Baitiao Scaled to Billions with Apache ShardingSphere
This article chronicles JD Baitiao's data‑architecture evolution from Solr + HBase to MongoDB and finally to Apache ShardingSphere, highlighting the challenges of massive data growth, the need for decoupling, and the performance, scalability, and operational benefits achieved by adopting ShardingSphere.
JD Baitiao used Apache ShardingSphere to solve the problem of storing and scaling trillions of data, laying the foundation for large‑scale promotional activities. Since early 2014, JD Baitiao’s data volume has exploded, and each major promotion tests the technical team while strategic shifts drive data‑architecture growth. -- Zhang Dongfang, JD Baitiao R&D Lead
JD Baitiao Data Architecture Evolution
Since its launch in February 2014, JD Baitiao’s data architecture has undergone several upgrades to handle explosive growth and massive data volumes.
2014‑2015: Solr + HBase
Solr served as an index for searchable fields while HBase stored the full data, relieving pressure on the core database but introducing integration complexity.
2015‑2016: MongoDB Sharding
Data was partitioned by month in a MongoDB cluster, improving hotspot query efficiency and allowing flexible schema changes, yet suffered from limited scalability and high memory consumption.
2016‑2017: DBRep → ES & HBase
With data exceeding hundreds of billions, a DBRep pipeline captured MySQL changes and replicated them to Elasticsearch and HBase, providing real‑time data flow and better scalability, though code coupling remained high.
Need for Decoupling
Application‑level sharding increased code complexity and upgrade effort, prompting a shift to a dedicated sharding component.
Comparison of self‑developed sharding vs. ShardingSphere:
Performance: high for both.
Code coupling: high vs. low.
Business intrusion: high vs. low.
Upgrade difficulty: high vs. low.
Scalability: average vs. good.
Apache ShardingSphere Solution
ShardingSphere‑JDBC is a lightweight Java framework that acts as an enhanced JDBC driver, requiring no extra deployment.
Key features that meet JD Baitiao’s requirements:
Mature product with active community.
Excellent performance due to micro‑kernel design.
Minimal code changes thanks to native MySQL protocol support.
Flexible extension via migration and synchronization components.
After extensive internal validation, ShardingSphere became the preferred sharding middleware for JD Baitiao at the end of 2018.
Product Adaptations
SQL engine upgrades improved compatibility with complex business logic, supporting full SQL routing, distributed primary keys (UUID, Snowflake), and zero‑intrusion hint‑based sharding.
Performance optimizations include SQL parse result caching, JDBC metadata caching, bind and broadcast tables, and automated execution with stream merging.
Migration Process
Data was migrated using DBRep and ShardingSphere‑Scaling over four weeks, synchronizing to target clusters while running parallel environments for verification.
Benefits
Simplified upgrade path, allowing developers to focus on business logic.
Reduced development effort by avoiding custom sharding components.
Flexible scaling to handle large promotional events.
ShardingSphere now enjoys over 14 K GitHub stars and adoption by more than 170 enterprises across finance, e‑commerce, cloud services, and other sectors.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
