Databases 15 min read

Mastering Sharding: When and How to Use Partitioning, Sharding, and NoSQL

This article explains why modern applications dealing with massive user, order, and transaction data must move beyond single‑table designs, compares partitioning, sharding (分库分表), and NoSQL/NewSQL solutions, and provides practical guidance on choosing middleware, sharding columns, and hybrid architectures such as es + HBase.

Programmer DD
Programmer DD
Programmer DD
Mastering Sharding: When and How to Use Partitioning, Sharding, and NoSQL

Why Sharding Matters

In the mobile‑Internet era, services generate billions of rows daily—user tables, order tables, transaction logs—far exceeding the optimal size of a single MySQL table (≈1 KB of BTREE index height). When a single table can no longer hold the data efficiently, three common strategies appear:

Partitioning

Sharding (分库分表)

NoSQL/NewSQL

All three are variations of data distribution; sharding can be viewed as a combination of database and table splitting. NoSQL examples include MongoDB and Elasticsearch; NewSQL examples include TiDB.

Why Not NoSQL/NewSQL?

Relational DBMSs still dominate because they offer a mature ecosystem, absolute stability, and strong transactional guarantees. Most companies keep RDBMS as the primary store and treat NoSQL/NewSQL as complementary, not a replacement.

Why Not Simple Partitioning?

Partitioned tables hide sharding details and work without a sharding key, but they inherit single‑instance limits on connections, network throughput, and overall concurrency, making them unsuitable for high‑traffic internet services.

Why Sharding (分库分表) Is Preferred

Sharding distributes data across multiple databases and tables, allowing horizontal scaling. Numerous middleware solutions exist, such as Alibaba's TDDL/DRDS, Sharding‑Sphere (formerly Sharding‑JDBC), MyCAT, 360 Atlas, and Meituan Zebra. These fall into two deployment models:

Client mode (e.g., TDDL, Sharding‑Sphere client)

Proxy mode (e.g., Cobar, MyCAT)

Both models share the same core steps: SQL parsing, rewriting, routing, execution, and result merging. The author prefers client mode for its simpler architecture, lower performance overhead, and easier operations.

Practical Sharding Cases

The first step is selecting the sharding column , which should align with the most frequent API traffic (e.g., user_id for user‑centric services). Common patterns include:

Single sharding column

Multiple sharding columns (each with its own set of shards)

Sharding column plus Elasticsearch for full‑text search

Examples:

Order Table

Alibaba’s order system uses three sharding columns: order_id, user_id, and merchant_code. The table can be implemented as either a fully redundant set (all columns stored in every shard) or a primary‑key‑only full table with secondary relationship tables. The trade‑off is speed versus storage cost.

冗余全量
冗余全量

Fully redundant tables offer faster queries but consume several times more storage and increase maintenance effort.

User Table

User data often requires sharding on user_id, mobile_no, email, or username. Because the total user count (≈7‑10 billion) is still manageable, a single sharding column plus Elasticsearch is usually sufficient.

用户表
用户表

Account Table

Account APIs commonly filter by account_no, making it a natural sharding key.

账户表
账户表

Handling Queries Without a Sharding Key

When a query lacks a sharding column, the middleware must route the request to every shard and merge results, which can degrade performance dramatically as the number of shards grows.

For complex or fuzzy searches, the typical solution is to replicate all data to Elasticsearch and let it handle the query, optionally storing the full data in HBase for massive storage.

条件筛选
条件筛选

Hybrid es + HBase Architecture

Elasticsearch provides fast multi‑condition search, while HBase offers massive, low‑latency row‑key lookups. The workflow is: query Elasticsearch for matching rowkeys, then fetch the full records from HBase. This separation leverages the strengths of both systems.

es+HBase
es+HBase

Summary

Sharding is not a one‑size‑fits‑all solution; it requires careful analysis of business traffic, sharding column selection, and complementary technologies such as Elasticsearch and HBase. The choice between client and proxy middleware, full‑redundant tables versus relationship tables, and the addition of search/index layers determines scalability, storage cost, and operational complexity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Scalabilityshardingmiddlewaredatabase partitioning
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.