Mastering Database Sharding: When and How to Use Partitioning Middleware
This article explains the concepts of database sharding and partitioning, compares common middleware solutions, and offers practical guidance on choosing and implementing horizontal or vertical splits to handle rapid growth and high concurrency in modern applications.
What Is Sharding and Partitioning?
Sharding (horizontal partitioning) distributes a table's rows across multiple databases or tables, keeping the schema identical while each instance holds a distinct subset of data. Partitioning reduces per‑table size, improves SQL performance, and allows the system to handle higher concurrency and storage demands.
Vertical partitioning splits a wide table into several tables (or databases) with different column sets, placing frequently accessed columns in a small table to improve cache efficiency.
When to Split?
Imagine a startup that begins with 20 000 registered users and 1 000 daily active users, handling 10 QPS. As the business scales to millions of users and thousands of QPS, a single database quickly becomes a bottleneck—disk fills up, query latency rises, and the server cannot sustain the load.
Sharding (Horizontal Split)
When a single table reaches several million rows, performance degrades. Splitting the table into multiple tables (e.g., each limited to ~2 million rows) keeps SQL fast.
Partitioning (Horizontal Split of Databases)
A single MySQL instance typically handles up to ~2 000 concurrent requests; keeping per‑instance QPS around 1 000 is advisable. Distributing data across several databases increases overall concurrency capacity.
Common Sharding Middleware
Cobar – Alibaba’s proxy‑layer solution, now largely abandoned and lacking features such as read/write splitting and cross‑database joins.
TDDL – Taobao’s client‑layer tool, supports basic CRUD and read/write splitting but depends on Alibaba’s Diamond configuration service.
Atlas – 360’s proxy solution, no longer actively maintained.
Sharding‑jdbc (now ShardingSphere) – Dangdang’s client‑layer library, actively maintained, supports sharding, read/write splitting, distributed ID generation, and flexible transactions (up to version 4.0.0‑RC1).
Mycat – Proxy‑layer middleware derived from Cobar, widely used, feature‑rich, but requires deployment and operational effort.
Pros and Cons of Client vs. Proxy Solutions
Client‑layer (Sharding‑jdbc/ShardingSphere) – No separate deployment, low operational cost, high performance, but introduces tight coupling; every application must upgrade the library when a new version is released.
Proxy‑layer (Mycat, Cobar, etc.) – Transparent to applications, easier to upgrade centrally, but requires dedicated servers and higher maintenance effort.
Choosing a Solution
For small‑to‑medium companies, the lightweight client‑layer approach (Sharding‑jdbc) is usually sufficient. Large enterprises with many services and teams may prefer a proxy‑layer solution (Mycat) to keep sharding logic centralized.
How to Split Databases
Horizontal (range or hash) – Distribute rows across multiple databases based on a range (e.g., time) or a hash of a key such as user_id. Range is simple to expand but can cause hotspot traffic; hash balances load but makes scaling more complex.
Vertical – Separate frequently accessed columns into one table and less‑used columns into another, reducing row size for hot data.
Table‑level splitting – Create multiple tables for a single logical entity, keeping each table’s row count within a manageable limit (often 1‑2 million rows).
Original source: https://jianshu.com/p/e06dc82ff673
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
