Databases 15 min read

When and How to Shard Your Database: Vertical & Horizontal Splitting Strategies

This article explains why and when to split databases, compares vertical and horizontal sharding methods, outlines practical steps for determining split mode, fields, rules and quantity, and discusses scaling, read/write separation, and common challenges with solutions.

dbaplus Community

Feb 24, 2020

When and How to Shard Your Database: Vertical & Horizontal Splitting Strategies

Why Split? When Is Splitting Needed?

Single‑node databases eventually hit CPU, memory, storage, or I/O limits, causing sharp performance degradation. Vertical scaling (scale‑up) becomes costly and reaches a ceiling, so horizontal scaling (scale‑out) requires distributing a table’s rows across multiple physical databases (horizontal sharding). Micro‑service architectures also naturally lead to vertical splitting because each service owns its own database.

Splitting Methods

1. Vertical Splitting

Divides tables or columns by business domain, hot/cold data, or large BLOB/TEXT fields. Typically each split resides in an independent database instance and a routing layer selects the correct datasource.

Advantages : clear business boundaries, easy integration/extension, low architectural complexity.

Disadvantages : limited join capability, possible single‑database bottlenecks for large tables, introduces distributed transactions.

2. Horizontal Splitting

Distributes rows of a single logical table across multiple databases and/or tables. Common patterns:

Database‑only sharding (e.g., ds1.order, ds2.order)

Table‑only sharding (e.g., ds.order_0, ds.order_1)

Database‑and‑table sharding (e.g., ds1.order_0, ds2.order_1)

Advantages : supports complex SQL (join, sub‑query) when data resides in the same shard, eliminates single‑node bottlenecks, and requires minimal application changes if a distributed data‑access layer exists.

Disadvantages : requires careful rule design, may introduce distributed transactions, incurs data‑migration effort during scaling, and cross‑shard joins must be handled in the application.

Specific Steps for Splitting

Determine Split Method – Choose vertical, horizontal, or a combination based on business characteristics.

Identify Split Fields – For vertical splitting select tables/columns; for horizontal splitting ensure each table has a shard key (usually primary key or unique index). If the key is absent, a global routing table is needed.

Define Split Rules – Common rules:

Range: e.g., IDs 0‑9999 → db1, 10000‑19999 → db2.

Hash: mod(N) based distribution; may require re‑hashing on expansion.

Custom: application‑defined routing via config, routing table, or custom algorithm.

Calculate Split Quantity – Estimate target data volume T. Choose a per‑table size P (e.g., 5 million rows for MySQL) → number of tables = T / P. Determine per‑database capacity L based on CPU, I/O, TPS thresholds → number of databases = T / L. Adjust for hardware budget and archiving strategy.

Scaling After Splitting

Vertical Scaling

Increase resources (CPU, memory, PCIe) of a heavily loaded database instance.

Horizontal Scaling

Add more database servers. With consistent hashing only a subset of data moves; with “doubling” expansion pre‑partition tables (e.g., 128 tables across 4 servers) and then migrate half of each to 4 new servers, updating the routing logic.

Challenges After Splitting

Distributed transaction complexity

Cross‑shard joins

Multi‑shard sorting, pagination, and aggregation

SQL routing and rewriting

Multi‑datasource management

Data‑summary queries across dimensions

Typical mitigations:

Avoid distributed transactions; use idempotent operations, reliable messaging, or TCC patterns.

Handle joins at the application layer.

Provide a robust distributed data‑access layer.

Use summary databases, secondary‑index databases, or small‑table broadcasting for aggregation.

Read‑Write Separation

Write traffic is directed to a primary instance; read traffic is distributed across replicas based on weight or location.

Replica lag is usually milliseconds but can reach seconds under heavy write load.

Never read from a replica within the same transaction to avoid dirty reads.

Provide an explicit “strong consistency” flag for queries that cannot tolerate lag.

In practice, avoid combining sharding with read‑write separation unless necessary, because sharding already reduces read pressure.

Principles and Experience

Prefer simple solutions; split only when required.

Choose split rules that keep >90% of transactions within a single shard.

Design a powerful yet well‑scoped data‑access layer; avoid overusing XA transactions, DDL, or multi‑shard XA commits.

Enforce SQL standards: no triggers, foreign keys, or cross‑shard joins; require indexes and shard keys; use auto‑increment primary keys.

Prefer flexible (eventual) consistency mechanisms (MQ, idempotent ops) over heavyweight distributed transactions.

Illustrations

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

scalability distributed architecture database sharding Read‑Write Separation Horizontal Partitioning Vertical Partitioning

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.