Databases 14 min read

How to Design and Implement Horizontal Database Sharding: A Real‑World Case Study

This article presents a comprehensive analysis of horizontal database sharding, detailing the design decisions, partitioning dimensions, routing, pagination handling, lookup mapping, and deployment steps based on a real‑world implementation at a large e‑commerce platform, offering practical guidance for scaling order databases.

ITFLY8 Architecture Home

Jul 15, 2017

How to Design and Implement Horizontal Database Sharding: A Real‑World Case Study

Horizontal Sharding Overview

With the growth of large‑scale internet applications, massive data storage and access become bottlenecks, making distributed processing essential. Horizontal sharding (splitting a large table across multiple databases) is a high‑difficulty but powerful technique.

Sharding Dimensions

The key is to choose a sharding field that minimizes impact on application code and SQL performance. By analyzing 500 SQL statements, the most frequent filter fields (userId, orderId, merchantId) are counted. The analysis shows userId appears as a single‑value filter in 120 statements and as a multi‑value filter in 40, making it the optimal sharding key.

Sharding Strategy

Two common strategies are range‑based sharding and modulo‑based sharding. Range sharding assigns contiguous ID ranges to each shard, while modulo sharding distributes records based on id % n. Modulo sharding is often preferred for its simplicity and even data distribution, though adjusting shard count later can be more complex.

Number of Shards

Shard count depends on the maximum records a single MySQL instance can handle (≈50 million) or Oracle instance (≈100 million). Too few shards fail to relieve pressure; too many increase cross‑shard query cost and hardware investment. An initial range of 4–8 shards is typical.

Transparent Routing

Sharding changes the DB schema, so routing logic should be placed in the data‑access layer (DAL) to keep application code transparent. The DAL automatically directs single‑shard queries, aggregates results for multi‑shard reads, and can handle aggregation operations after gathering per‑shard results.

Pagination Handling

Cross‑shard pagination requires fetching extra rows from each shard and merging them in the application, which becomes increasingly expensive for later pages. Mitigation strategies include limiting visible pages, increasing page size for batch jobs, or routing pagination queries to a big‑data platform.

Lookup Mapping

A lookup table maps non‑sharding keys (e.g., orderId) to the sharding key (userId), enabling single‑shard access for queries that only know the orderId. The lookup can be cached for fast retrieval.

Overall Architecture

The system consists of an order service/proxy, a distributed DAL, MySQL shards, a lookup table with cache, and a synchronization component that replicates data from the original Oracle database to the MySQL shards.

Deployment Steps

Implementation proceeded in two phases: first, run Oracle and MySQL in parallel and sync data incrementally; second, gradually shift non‑real‑time reads to MySQL shards, then switch all real‑time reads and retire Oracle. This staged approach isolated technical risk from business risk and resulted in a smooth migration.

Project Summary

The horizontal sharding of the order database, combined with migration from Oracle to MySQL, enabled massive scalability while reducing costs. Performance testing with realistic traffic showed that six MySQL shards matched the single Oracle instance’s query latency. The final design used the last three digits of userId modulo 6, supporting up to 768 shards, and incorporated orderId generation that embeds the sharding key, allowing the lookup layer to be phased out over time.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e‑commerce mysql scalable architecture Oracle DAL

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.