Sharding Mastery: Strategies, Key Generation & Seamless Scaling
This article presents a comprehensive guide to database sharding, covering implementation strategies, vertical and horizontal partitioning, global primary key generation techniques, framework versus custom solutions, multi‑data‑source transaction handling, and a novel, migration‑free scaling approach that combines incremental range and hash‑based routing.
Sharding Implementation Strategy and Demo
Part 1: Implementation Strategy
Before sharding a database, developers must fully understand the business logic and schema. Drawing an ER diagram or a domain model diagram helps define shards intuitively. Choose the diagram type based on whether the project follows a data‑driven or domain‑driven approach.
Preparation phase includes analyzing the system and creating a visual model of tables and their relationships.
Analysis phase consists of vertical and horizontal splitting.
Vertical splitting groups tightly related tables (e.g., tables belonging to the same module) into the same shard, using the diagram as a guide. Each vertical shard can be visualized as a swim‑lane.
Horizontal splitting is applied after vertical splitting when a shard’s data volume or growth rate warrants further division. If the grouped tables grow slowly, a single shard suffices; otherwise, the shard is divided into smaller shards, each typically containing one primary table and its associated secondary tables. After horizontal splitting, cross‑shard joins, GROUP BY, ORDER BY, etc., must be avoided and handled at the application layer.
Implementation phase: if sharding is decided at project start, follow the design; if introduced later, build the sharding infrastructure and modify affected SQL statements.
Part 2: Demo
The demo uses the well‑known JPetStore application (original iBATIS demo). Its domain model includes three modules: User, Product, and Order.
Vertical splitting places User‑related tables in one shard, Order‑related tables in another, and Product‑related tables (Product, Category, Item, Inventory, Supplier) in a third shard. For demonstration, the Product module is also horizontally split into two shards: (Product, Category) and (Item, Inventory, Supplier). Both shards share the same hash modulus, so IDs are distributed evenly.
Global Primary Key Generation Strategies
Part 1: Common Strategies
UUID – simple but large and index‑heavy.
Sequence table – a dedicated table storing the next ID per logical table.
CREATE TABLE `SEQUENCE` (
`tablename` varchar(30) NOT NULL,
`nextid` bigint(20) NOT NULL,
PRIMARY KEY (`tablename`)
) ENGINE=InnoDB;The sequence table approach suffers from a single‑point bottleneck and a single point of failure. Using master‑slave replication mitigates the latter but not the performance issue.
Part 2: An Excellent Strategy (Flickr)
Flickr uses multiple ID‑generation servers, each with its own Sequence table. The step size of each Sequence equals the number of servers, and the start values are offset, so IDs are interleaved across servers (odd IDs from server 1, even IDs from server 2, etc.). This distributes load and provides automatic failover.
Key details:
Each ID‑generation server runs a single‑database MySQL instance with a Sequence table.
The Sequence table usually contains one row (a stub column) that can serve many logical tables.
MySQL's LAST_INSERT_ID() must be called in the same connection as the REPLACE INTO that inserts the row.
The Sequence table uses the MyISAM engine for table‑level locking, which avoids ID collisions under concurrency.
Pure JDBC access is faster than Spring JDBC for this pattern.
Application responsibilities include load‑balancing requests to the ID servers and handling failover when a server becomes unavailable.
Framework vs. Custom Development and Sharding Implementation Considerations
Sharding can be implemented at four layers:
DAO layer – embeds sharding logic directly in DAO methods; offers flexibility and no ORM constraints.
ORM framework layer – e.g., Hibernate Shards, Guzz; limited by ORM capabilities.
JDBC API layer – a custom JDBC driver that makes sharding transparent; high technical barrier.
Spring template layer – custom Spring templates (e.g., Cobar Client) provide a balance of transparency and ease of implementation.
Proxy layer between application server and database – MySQL Proxy, Amoeba; route SQL after parsing, fully transparent but may lack transaction support.
Choosing a framework depends on project size, team expertise, and required features; many architects prefer a cautious, case‑by‑case evaluation.
Multi‑Data‑Source Transaction Handling
Distributed Transactions
Two‑phase commit guarantees atomicity across databases but adds latency, increases lock contention, and hinders horizontal scaling.
Best‑Efforts 1PC
One‑phase commit sacrifices strict consistency in failure scenarios for better performance and scalability; widely adopted by sharding frameworks.
Transaction Compensation
For systems tolerant of eventual consistency, compensation mechanisms (reconciliation, log‑based fixes, periodic sync) can replace immediate rollback.
Summary
Distributed transactions are strict but slow; Best‑Efforts 1PC balances reliability and scalability; compensation is suitable when strong consistency is not required.
Sharding Expansion Scheme Without Data Migration
The proposed scheme combines incremental range routing (to avoid data migration) with hash‑based distribution inside each range (to prevent hotspots). A ShardGroup represents a range of IDs and contains multiple Shards that receive data via hashing. When expanding, a new ShardGroup is added, marked writable, while the previous group becomes read‑only, eliminating the need to move existing data.
Extensions include supporting fragment tables (or partitioned tables) when a single table reaches its size limit, and re‑using “regenerated” storage space from old shards by marking their ShardGroup writable again and assigning a new ID interval.
Overall, the scheme offers migration‑free scaling, balanced read/write distribution, and a stable routing algorithm that does not require code changes during expansion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
