Mastering Database Sharding: Design Strategies and Practical Cases
This article introduces the fundamentals of database sharding, outlines architectural evolution, explains vertical and horizontal splitting dimensions and strategies, discusses middleware choices, and presents real‑world case studies while addressing common challenges such as ID generation, join queries, pagination, and distributed transactions.
Introduction
Sharding (splitting databases and tables) is a core technology in distributed system architecture, providing solutions for massive data storage and high‑performance queries. Designing an effective sharding strategy is a key test of an architect's expertise.
Outline
1. Fundamentals
Evolution of the data layer and data access layer
Sharding dimensions and strategies
Problems caused by sharding and their solutions
2. Middleware Selection
Mycat
TDDL
Sharding‑JDBC
Custom solutions
3. Sharding in Practice
User system splitting case
Order system splitting case
Payment system splitting case
Other examples
Data Layer Evolution
Single Database : Suitable for early stages or low traffic, using a single database is the simplest access pattern.
Read/Write Separation : When read traffic grows, a read‑write split distributes queries to read replicas, with automatic switching in the data access layer.
Hotspot Cache : To handle uneven data access, combine local (JVM or disk) and distributed caches; a read‑write ratio of about 4:1 often justifies adding cache.
Sharding (Database/Table Splitting) : When write pressure exceeds a single database’s capacity, split databases and, based on table access patterns, split tables accordingly.
Sharding Dimensions and Strategies
2.1 Splitting Dimensions
Sharding can be vertical or horizontal, and can target either databases or tables, resulting in four combinations: vertical database, vertical table, horizontal database, and horizontal table.
Vertical Database Splitting : Addresses high concurrency by separating business domains (e.g., user domain, product domain) into different databases, and further splitting within a domain based on business attributes.
Vertical Table Splitting : Solves single‑table high‑concurrency or hot‑cold data issues by separating a large table into multiple tables (e.g., basic user info vs. extended user info).
Horizontal Database Splitting : Distributes the same table’s data across multiple databases to relieve pressure on a single instance.
Horizontal Table Splitting : Divides a large table into multiple tables based on a sharding key, improving write and storage capacity.
2.2 Splitting Strategies
Choose sharding keys with business meaning (e.g., userId, orderId).
Estimate the number of databases/tables based on projected data volume and traffic.
Alibaba’s guideline: split when a single table exceeds 5 million rows or 2 GB, but adjust according to field types, length, and access patterns.
Plan scaling (expansion and contraction) strategies.
Design migration plans for moving data.
Select appropriate access‑layer code or middleware.
Problems and Solutions
3.1 ID Generation
After sharding, use distributed ID generators such as UUID, Snowflake, or custom solutions; Snowflake is commonly preferred for its uniqueness, monotonicity, high availability, and performance.
3.2 Join Queries
Approaches include querying each shard then merging results, using global tables, field redundancy, E‑R tables (grouping related tables in one database), or wide tables.
3.3 Pagination and Sorting
Can be handled by querying each shard then merging, or by leveraging Elasticsearch or wide tables; ES or wide‑table solutions are recommended.
3.4 Distributed Transactions
Common solutions are 2PC, 3PC, TCC, SAGA, and eventual consistency (via local/transactional messages, notifications, reconciliation, etc.). Prefer avoiding distributed transactions; if needed, favor eventual consistency.
References
Sharding concepts: https://blog.csdn.net/dgfdhgghd/article/details/128426013
When to consider sharding: https://zhuanlan.zhihu.com/p/1894001842972767471
Recommended MySQL single‑table size: https://www.zhihu.com/question/60438093/answer/51559087283
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
