Databases 18 min read

Database Sharding: Strategies, Bottlenecks, and Best Practices

This article explains the performance bottlenecks of relational databases, introduces sharding concepts such as horizontal, vertical, and table partitioning, discusses associated challenges like distributed transactions and global primary keys, and provides practical solutions and tool recommendations for implementing effective database partitioning.

Top Architect
Top Architect
Top Architect
Database Sharding: Strategies, Bottlenecks, and Best Practices

Relational databases often become system bottlenecks because a single instance has limited storage capacity, connection count, and processing power. When a single table reaches around 10 million rows or 100 GB, even adding read replicas or optimizing indexes may not prevent severe performance degradation, prompting the need for data sharding.

Core of Distributed Databases

Sharding (data partitioning) is the fundamental technique, together with locating and aggregating data after partitioning. Sharding distributes data across multiple databases, reducing the amount of data stored in any single node and thereby improving overall operation performance.

Sharding Types

Sharding can be performed in two main ways: vertical (column‑level) sharding and horizontal (row‑level) sharding.

Database Bottlenecks

Both I/O and CPU bottlenecks increase active connection counts, eventually reaching the maximum connections a database can handle, which leads to service degradation, reduced throughput, and possible crashes.

I/O Bottleneck

Disk read I/O: hot data exceeds cache capacity, causing massive I/O and slowing queries → use database‑level sharding or vertical table splitting.

Network I/O: excessive request volume exceeds bandwidth → use database sharding.

CPU Bottleneck

Complex SQL (JOIN, GROUP BY, ORDER BY, non‑indexed conditions) increases CPU load → optimize SQL, add appropriate indexes, move calculations to the service layer.

Large tables cause full scans, increasing CPU usage → consider horizontal table sharding.

Horizontal Sharding (Database Level)

Concept: Based on a chosen field and a strategy (hash, range, etc.), data from one logical database is split into multiple physical databases, reducing the data volume per instance.

All databases share the same schema.

Data in each database is distinct with no overlap.

The union of all databases represents the full dataset.

Typical scenario: overall concurrent load is high, but table‑level sharding alone cannot solve the problem and there is no clear business module to justify vertical sharding.

Horizontal Table Sharding

Concept: Based on a field, data from a single logical table is split into multiple physical tables using a strategy such as hash or range.

All tables share the same structure.

Each table holds a distinct subset of rows; the union equals the full dataset.

Scenario: a single table’s size degrades SQL efficiency and increases CPU load, even though overall concurrency is not high.

Vertical Sharding (Database Level)

Concept: Based on business domains, different tables are placed into different databases.

Each database may have a different schema.

Data across databases is distinct with no overlap.

The union of all databases forms the complete dataset.

Scenario: overall concurrent load is high and distinct business modules can be isolated into separate databases.

Vertical Table Sharding

Concept: Based on column activity, a table is split into a “main” table (hot columns) and one or more “extension” tables (cold columns).

Table structures differ.

Each table stores different columns; they share a common key (usually the primary key) for joining.

The union of all tables represents the full logical table.

Scenario: large rows contain many rarely accessed columns, causing cache pressure and random‑read I/O; moving cold columns to extension tables reduces cache usage and I/O.

Sharding Tools

sharding‑jdbc (Dangdang)

TSharding (Mogujie)

Atlas (Qihoo 360)

Cobar (Alibaba)

MyCAT (based on Cobar)

Oceanus (58.com)

Vitess (Google) – evaluate pros and cons of each.

Problems Introduced by Sharding

Transactional Consistency

Distributed Transactions

When updates span multiple databases, cross‑shard transactions arise. Typical solutions involve the XA protocol or two‑phase commit, which increase transaction latency and the risk of deadlocks as the number of nodes grows.

Eventual Consistency

For systems tolerant of slight delays, eventual consistency can be achieved via compensation mechanisms such as data reconciliation, log‑based comparison, or periodic synchronization with a source of truth.

Cross‑Node Join Issues

Pre‑sharding, many list/detail relationships could be resolved with JOINs. After sharding, related data may reside on different nodes, making JOINs expensive. Common mitigation strategies:

Global tables: replicate low‑change reference data (e.g., dictionaries) to every shard.

Field redundancy: denormalize frequently needed fields to avoid joins.

Data assembly: perform two‑step queries in the service layer and merge results.

ER‑sharding: keep strongly related tables in the same shard to allow intra‑shard joins.

Pagination, Sorting, and Aggregate Functions

Across‑shard queries must first sort and paginate within each shard, then merge and re‑sort the combined result set, which can be CPU‑ and memory‑intensive for large page numbers. Aggregate functions (MAX, MIN, SUM, COUNT) must be executed per shard and then aggregated.

Global Primary‑Key Duplication

Auto‑increment IDs are unsuitable for a sharded environment. Common strategies:

UUID: 128‑bit identifier generated locally; simple but large and can degrade index performance.

Sequence table: a dedicated table that generates incremental IDs. CREATE TABLE `sequence` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `stub` char(1) NOT NULL DEFAULT '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`) ) ENGINE=MyISAM; REPLACE INTO sequence (stub) VALUES ('a'); SELECT LAST_INSERT_ID(); This approach is simple but creates a single point of failure.

Distributed sequence servers: multiple servers each host a sequence table; IDs are generated with step size equal to the number of shards, ensuring uniqueness while distributing load.

Snowflake algorithm: 64‑bit IDs composed of timestamp, datacenter ID, worker ID, and a sequence counter, supporting up to 1024 nodes and 4096 IDs per millisecond. 1 bit unused. 41 bits for millisecond timestamp (≈69 years). 5 bits datacenter ID + 5 bits worker ID. 12 bits for per‑millisecond sequence.

Data Migration and Scaling

When traffic and storage approach limits, sharding is considered. Historical data must be extracted, re‑partitioned according to the new sharding rule, and written to the appropriate shards. Capacity planning should aim for no single table exceeding roughly 10 million rows.

When to Consider Sharding

Prefer Not to Shard

Only shard when the data volume truly hits a bottleneck. Prior to sharding, exhaust other optimizations: hardware upgrades, network improvements, read/write splitting, and index tuning.

Operational Impact of Large Tables

Backup of huge tables consumes massive I/O and bandwidth.

DDL on large tables can lock the table for extended periods, blocking access.

Frequent updates on large tables increase lock contention.

Vertical Field Splitting

When certain columns are rarely accessed or updated, move them to separate tables to reduce row size and improve cache efficiency.

Rapid Data Growth

Fast‑growing business workloads eventually push single‑table performance to its limits, making horizontal sharding the appropriate solution.

Promotional Note

The article concludes with a call to join the “Top Architecture” community for additional resources and a surprise gift.

Performance Optimizationdatabase shardingdistributed transactionshorizontal partitioningvertical partitioningGlobal ID Generation
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.