Databases 13 min read

When and How to Implement Database Sharding and Partitioning for High‑Scale Systems

This article explains why database sharding and partitioning become necessary, outlines performance bottlenecks, presents software and hardware optimization techniques, details horizontal and vertical splitting strategies, and discusses the added complexity such as cross‑database joins, distributed transactions, and ID generation.

macrozheng
macrozheng
macrozheng
When and How to Implement Database Sharding and Partitioning for High‑Scale Systems

Why Sharding and Partitioning?

When a database reaches performance bottlenecks—such as request blocking, slow SQL queries, or storage pressure—simply upgrading hardware becomes costly, so software‑level solutions like sharding are preferred.

Database Optimization Strategies

Optimizations are divided into software and hardware layers.

SQL tuning

Table structure optimization

Read/write separation

Database clustering

Sharding (database and table partitioning)

Hardware upgrades

SQL Tuning

Enable slow query logging in MySQL:

<code>slow_query_log=on
long_query_time=1
slow_query_log_file=/path/to/log
</code>

Use

EXPLAIN

to check if a query hits an index; aim for

range

or better.

Table Structure Optimization

Redundant fields (e.g., storing user nickname in the order table) can reduce join overhead but should be used for rarely updated data.

Architecture Optimization

Scale by adding read replicas, introducing caching layers (e.g., Redis), and eventually moving to sharding when reads remain a bottleneck.

Hardware Optimization

Hardware upgrades provide early gains but diminish returns as the system grows.

Detailed Sharding Process

Single Application, Single Database

Early-stage monolithic apps use one database for all modules.

Multiple Applications, Single Database

Modules are split into services but still share one database to minimize impact.

Multiple Applications, Multiple Databases

Each service gets its own database, eliminating contention and preparing for sharding.

Table Partitioning

When a single table grows rapidly (e.g., >5 million rows), horizontal or vertical splitting is needed.

Horizontal vs. Vertical Splitting

Vertical splitting separates columns into different tables; horizontal splitting distributes rows across multiple tables or databases.

Daily tables store only the day's data.

Monthly tables aggregate a month’s data.

Historical tables archive data older than a threshold.

Single‑Database vs. Multi‑Database Splitting

Horizontal splits can reside in the same DB or be spread across multiple DBs to overcome storage limits.

Complexities Introduced by Sharding

Cross‑Database Joins

Solutions include field redundancy, data aggregation (ETL), global tables, or application‑level assembly.

Distributed Transactions

Use reliable messaging, two‑phase commit, or flexible transaction patterns.

Sorting, Pagination, and Function Computation

Execute functions on each shard, then merge results.

Distributed ID Generation

Common approaches: UUID, dedicated ID tables, segment allocation, Redis, Snowflake, Baidu uid‑generator, Meituan Leaf, Didi TinyID.

Multiple Data Sources

Middleware such as ShardingSphere (formerly Sharding‑JDBC) or Mycat can abstract multiple databases.

Conclusion

Before resorting to sharding, exhaust conventional optimizations; sharding adds significant complexity and should be applied only when truly necessary.

database shardingSQL Optimizationdistributed transactionshorizontal scalingtable partitioning
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.