Databases 12 min read

Mastering Database Sharding: When and How to Split Tables Effectively

This article explains why large tables hurt performance, defines database sharding and its vertical and horizontal strategies, compares partitioning by key, size, or time, and outlines practical implementation options and migration steps for robust, scalable backend systems.

NiuNiu MaTe

Feb 21, 2021

Mastering Database Sharding: When and How to Split Tables Effectively

1. Why Partition Tables

As business data grows, query latency increases dramatically; a simple COUNT on 10 million rows may take several seconds, and heavy queries can throttle the entire service. Splitting a massive table—like cutting a watermelon—relieves database pressure and improves processing efficiency.

2. What Is Sharding

Sharding (分库分表) is the practice of dividing an oversized dataset into smaller, more manageable pieces. It consists of two parts:

Database sharding (分库) : multiple physical databases each hold a subset of tables.

Table sharding (分表) : a single logical table is split into many physical tables with identical schema.

Both aim to reduce data volume per storage unit. The common division methods are vertical partitioning (different columns go to different tables) and horizontal partitioning (same schema across many tables).

3. Vertical Partitioning

Separate rarely‑used or large columns into an auxiliary table. For example, the answer field of a user_score table (which can be very large) can be moved to an answer table, keeping the main table lightweight for ranking queries.

4. Horizontal Partitioning

Split a table into multiple identical sub‑tables based on a chosen dimension:

By key : e.g., user_id % 10 creates tables user_score_0 … user_score_9. This works well when queries frequently filter by that key.

By size : create a new table every 2 million rows (e.g., user_score_1, user_score_2, …). Suitable for pure log‑type data.

By time : partition by month or year (e.g., user_score_202101, user_score_202102). Ideal for time‑series or billing data.

Each method has trade‑offs: key‑based sharding can lead to uneven distribution; size‑based sharding may cause cross‑table queries; time‑based sharding simplifies data aging but requires accurate forecasting.

5. Implementation Approaches

Sharding logic should reside in a reusable component:

Common library : embed sharding code in a shared package that each service imports. Simple, no extra infrastructure, but requires coordinated updates across services.

Middleware service : a dedicated proxy (e.g., Mycat) handles routing and sharding centrally. Easier to evolve, lower coupling, but adds operational overhead.

6. Migrating Existing Tables

When a legacy table must be sharded without downtime, follow a four‑phase migration:

Dual‑write, read old : writes go to both old and new tables while copying historical data.

Dual‑write, dual‑read : gradually shift read traffic to the new tables, optionally verifying data consistency.

Dual‑write, read new : after confidence builds, read exclusively from new tables while still writing to both.

Write new, read new : retire the old tables (cold backup).

Each phase may span weeks and can introduce temporary performance overhead; careful monitoring is essential.

7. Final Thoughts

Database sharding sits at the intersection of performance tuning and system architecture. Mastering the various partitioning strategies and migration techniques equips backend engineers to build scalable, high‑performance services.

backend development database sharding Table Partitioning

Written by

NiuNiu MaTe

Joined Tencent (nicknamed "Goose Factory") through campus recruitment at a second‑tier university. Career path: Tencent → foreign firm → ByteDance → Tencent. Started as an interviewer at the foreign firm and hopes to help others.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.