How to Design and Implement Horizontal Database Sharding: A Real‑World Case Study
This article presents a comprehensive analysis of horizontal database sharding, detailing the design decisions, partitioning dimensions, routing, pagination handling, lookup mapping, and deployment steps based on a real‑world implementation at a large e‑commerce platform, offering practical guidance for scaling order databases.
Horizontal Sharding Overview
With the growth of large‑scale internet applications, massive data storage and access become bottlenecks, making distributed processing essential. Horizontal sharding (splitting a large table across multiple databases) is a high‑difficulty but powerful technique.
Sharding Dimensions
The key is to choose a sharding field that minimizes impact on application code and SQL performance. By analyzing 500 SQL statements, the most frequent filter fields (userId, orderId, merchantId) are counted. The analysis shows userId appears as a single‑value filter in 120 statements and as a multi‑value filter in 40, making it the optimal sharding key.
Sharding Strategy
Two common strategies are range‑based sharding and modulo‑based sharding. Range sharding assigns contiguous ID ranges to each shard, while modulo sharding distributes records based on id % n. Modulo sharding is often preferred for its simplicity and even data distribution, though adjusting shard count later can be more complex.
Number of Shards
Shard count depends on the maximum records a single MySQL instance can handle (≈50 million) or Oracle instance (≈100 million). Too few shards fail to relieve pressure; too many increase cross‑shard query cost and hardware investment. An initial range of 4–8 shards is typical.
Transparent Routing
Sharding changes the DB schema, so routing logic should be placed in the data‑access layer (DAL) to keep application code transparent. The DAL automatically directs single‑shard queries, aggregates results for multi‑shard reads, and can handle aggregation operations after gathering per‑shard results.
Pagination Handling
Cross‑shard pagination requires fetching extra rows from each shard and merging them in the application, which becomes increasingly expensive for later pages. Mitigation strategies include limiting visible pages, increasing page size for batch jobs, or routing pagination queries to a big‑data platform.
Lookup Mapping
A lookup table maps non‑sharding keys (e.g., orderId) to the sharding key (userId), enabling single‑shard access for queries that only know the orderId. The lookup can be cached for fast retrieval.
Overall Architecture
The system consists of an order service/proxy, a distributed DAL, MySQL shards, a lookup table with cache, and a synchronization component that replicates data from the original Oracle database to the MySQL shards.
Deployment Steps
Implementation proceeded in two phases: first, run Oracle and MySQL in parallel and sync data incrementally; second, gradually shift non‑real‑time reads to MySQL shards, then switch all real‑time reads and retire Oracle. This staged approach isolated technical risk from business risk and resulted in a smooth migration.
Project Summary
The horizontal sharding of the order database, combined with migration from Oracle to MySQL, enabled massive scalability while reducing costs. Performance testing with realistic traffic showed that six MySQL shards matched the single Oracle instance’s query latency. The final design used the last three digits of userId modulo 6, supporting up to 768 shards, and incorporated orderId generation that embeds the sharding key, allowing the lookup layer to be phased out over time.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
