Databases 4 min read

Mastering Data Sharding: Vertical & Horizontal Splitting Strategies for Scalable Systems

This article explains essential data sharding techniques for large-scale architectures, emphasizing the priority of vertical splitting by business modules, followed by horizontal partitioning of hot tables, and outlines stable sharding rules, hotspot avoidance, and minimizing cross‑shard transactions to ensure optimal performance.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mastering Data Sharding: Vertical & Horizontal Splitting Strategies for Scalable Systems

1. Prioritize Vertical Splitting

Typically start with vertical splitting by business modules to define database boundaries, then apply horizontal splitting on hot tables. Vertical splitting reduces coupling and enables independent optimization, while direct horizontal splitting can cause complex cross‑shard queries.

Diagram
Diagram

2. Consider Horizontal Splitting

When a table (e.g., orders) grows large, apply horizontal sharding. Example: a social platform hashes user_id into 16 shards, each with its own database instance.

Note: a globally unique ID generation strategy (e.g., Snowflake) and routing layer are required.

Diagram
Diagram

3. Business Association Principle

During horizontal splitting, keep strongly related data in the same shard. For an order system, order_id, order_item, and order_logistics are tightly coupled; placing them together avoids cross‑database JOINs and improves query efficiency.

4. Splitting Rules Must Be Stable

Once sharding keys and rules are set, avoid changing them, as modifications require costly data migration and risk errors.

Example: if user_id is hashed modulo 16 databases, changing to 32 would require redistributing all data.

Recommendation: reserve a larger logical shard count (e.g., 64) from the start.

Diagram
Diagram

5. Avoid Data Hotspots

Distribute data evenly across shards to prevent a single database or table from becoming a performance bottleneck.

Diagram
Diagram

6. Minimize Cross‑Shard Transactions

Design to avoid operations that need to access multiple shards, because distributed transactions (2PC, XA) incur high overhead and cross‑shard JOINs cause full scans and severe performance degradation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data shardingDatabase designhorizontal partitioningvertical splitting
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.