Databases 15 min read

Mastering Sharding: When and How to Use Partitioning, Sharding, and NoSQL

This article explains why modern applications dealing with massive user, order, and transaction data must move beyond single‑table designs, compares partitioning, sharding (分库分表), and NoSQL/NewSQL solutions, and provides practical guidance on choosing middleware, sharding columns, and hybrid architectures such as es + HBase.

Programmer DD

Nov 8, 2018

Mastering Sharding: When and How to Use Partitioning, Sharding, and NoSQL

Why Sharding Matters

In the mobile‑Internet era, services generate billions of rows daily—user tables, order tables, transaction logs—far exceeding the optimal size of a single MySQL table (≈1 KB of BTREE index height). When a single table can no longer hold the data efficiently, three common strategies appear:

Partitioning

Sharding (分库分表)

NoSQL/NewSQL

All three are variations of data distribution; sharding can be viewed as a combination of database and table splitting. NoSQL examples include MongoDB and Elasticsearch; NewSQL examples include TiDB.

Why Not NoSQL/NewSQL?

Relational DBMSs still dominate because they offer a mature ecosystem, absolute stability, and strong transactional guarantees. Most companies keep RDBMS as the primary store and treat NoSQL/NewSQL as complementary, not a replacement.

Why Not Simple Partitioning?

Partitioned tables hide sharding details and work without a sharding key, but they inherit single‑instance limits on connections, network throughput, and overall concurrency, making them unsuitable for high‑traffic internet services.

Why Sharding (分库分表) Is Preferred

Sharding distributes data across multiple databases and tables, allowing horizontal scaling. Numerous middleware solutions exist, such as Alibaba's TDDL/DRDS, Sharding‑Sphere (formerly Sharding‑JDBC), MyCAT, 360 Atlas, and Meituan Zebra. These fall into two deployment models:

Client mode (e.g., TDDL, Sharding‑Sphere client)

Proxy mode (e.g., Cobar, MyCAT)

Both models share the same core steps: SQL parsing, rewriting, routing, execution, and result merging. The author prefers client mode for its simpler architecture, lower performance overhead, and easier operations.

Practical Sharding Cases

The first step is selecting the sharding column , which should align with the most frequent API traffic (e.g., user_id for user‑centric services). Common patterns include:

Single sharding column

Multiple sharding columns (each with its own set of shards)

Sharding column plus Elasticsearch for full‑text search

Examples:

Order Table

Alibaba’s order system uses three sharding columns: order_id, user_id, and merchant_code. The table can be implemented as either a fully redundant set (all columns stored in every shard) or a primary‑key‑only full table with secondary relationship tables. The trade‑off is speed versus storage cost.

Fully redundant tables offer faster queries but consume several times more storage and increase maintenance effort.

User Table

User data often requires sharding on user_id, mobile_no, email, or username. Because the total user count (≈7‑10 billion) is still manageable, a single sharding column plus Elasticsearch is usually sufficient.

Account Table

Account APIs commonly filter by account_no, making it a natural sharding key.

Handling Queries Without a Sharding Key

When a query lacks a sharding column, the middleware must route the request to every shard and merge results, which can degrade performance dramatically as the number of shards grows.

For complex or fuzzy searches, the typical solution is to replicate all data to Elasticsearch and let it handle the query, optionally storing the full data in HBase for massive storage.

Hybrid es + HBase Architecture

Elasticsearch provides fast multi‑condition search, while HBase offers massive, low‑latency row‑key lookups. The workflow is: query Elasticsearch for matching rowkeys, then fetch the full records from HBase. This separation leverages the strengths of both systems.

Summary

Sharding is not a one‑size‑fits‑all solution; it requires careful analysis of business traffic, sharding column selection, and complementary technologies such as Elasticsearch and HBase. The choice between client and proxy middleware, full‑redundant tables versus relationship tables, and the addition of search/index layers determines scalability, storage cost, and operational complexity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scalability sharding middleware database partitioning

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Why Sharding Matters

Why Not NoSQL/NewSQL?

Why Not Simple Partitioning?

Why Sharding (分库分表) Is Preferred

Practical Sharding Cases

Order Table

User Table

Account Table

Handling Queries Without a Sharding Key

Hybrid es + HBase Architecture

Summary

Programmer DD

How this landed with the community

Was this worth your time?

0 Comments

Hybrid es + HBase Architecture