Seamless Sharding Migration: Dual‑Write, Mapping Keys, and Diff Validation
This article explains how to smoothly migrate a legacy single‑database system to a sharded architecture by using dual‑write, mapping‑key routing, custom transaction handling, and offline/real‑time diff checks, while detailing the underlying MyBatis integration and component‑based design.
Background
Sharding (splitting databases and tables) is a common optimization for large‑scale internet applications. Middleware such as Sharding‑JDBC or MyCAT can handle the routing, but the real challenge lies in upgrading an existing monolithic system to a sharded one without downtime.
Original Problems
Rollback during migration can cause data inconsistency when new data is written to the new shard before the rollback.
SQL statements in a mature system often lack the required sharding key, making full‑scale rewrite impractical.
Ensuring business‑level data equivalence between the pre‑sharding and post‑sharding systems.
First Smooth Migration Practice
Implement dual‑write : write to both the original single‑table database and the new sharded database, using a mapping‑key concept to route reads/writes.
Use a special transaction to keep the two systems consistent.
Apply an iff check to verify that data in both databases are equivalent.
Prerequisite Knowledge – MyBatis Basics
MyBatis provides three layers:
Interface layer : the SqlSession API used by developers; Spring injects dynamic proxy implementations of mapper interfaces.
Data‑processing layer : handles parameter processing, SQL parsing, execution, and result construction.
Foundation layer : manages connections, transactions, configuration loading, and caching.
MyBatis‑Spring Execution Flow
SQL execution starts from SqlSession, which delegates to an Executor. Executor builds a JDBC Statement via StatementHandler, handling caching and parameter conversion. StatementHandler wraps the JDBC Statement and uses ParameterHandler, ResultSetHandler, and TypeHandler for parameter binding, result conversion, and type mapping.
Dual‑Write Implementation
During migration the original single‑table database remains active while a new sharded database is created. Writes are performed on both databases; reads can be directed to either side once equivalence is confirmed. The dual‑write is achieved via a MyBatis plugin that intercepts SQL, swaps the SqlSession, and rewrites the statement for the secondary data source.
Key technical challenges:
Switching the target data source inside MyBatis while preserving Spring transaction semantics (setting autoCommit to false in the secondary connection and managing reference counts).
Duplicating the JDBC Statement and its parameters for the secondary execution, ensuring isolation between parent and child contexts to avoid ThreadLocal pollution.
Mapping‑Key Routing
When a query lacks the sharding key, a mapping key is used. The mapping key is stored in a separate mapping table (also sharded). At runtime the plugin looks up the mapping key, derives the actual sharding key, and computes the physical table location, avoiding full‑table broadcasts.
Example: a query by couponId (mapping key) is mapped to orderId (sharding key) via a couponId → orderId entry, allowing the query to be routed correctly.
Diff and Transaction Validation
To prove equivalence, the system performs both offline and real‑time diffs:
Offline diff compares full data sets from both databases for a given day.
Real‑time diff checks the result of each write operation; if write diffs match, read diffs are skipped for performance.
When diffs converge to zero within 24 hours, the databases are considered equivalent. Transaction handling uses a custom “distributed” transaction: the primary Spring transaction manages the main data source, while a secondary connection is wrapped in a Spring TransactionSynchronization to commit or roll back together.
New Issues and Componentization
After the first migration, several pain points emerged: testing difficulty, high maintenance cost due to extensive annotations/configuration, and limited reusability. To address these, the solution was refactored into a reusable component with three layers:
Access layer : stable, backward‑compatible APIs (Spring starter, annotations).
Core layer : routing logic, lifecycle management, and plugin points.
Storage layer : actual SQL execution via data sources and MyBatis.
Core Implementation Details
The entry point is SqlRouterInterceptor (a MyBatis plugin). It delegates to SqlRouteProcessor, which:
Parses the SQL into a SqlInfo structure (contains tables, columns, conditions, type).
Fills parameters into SqlInfo via a proxy PreparedParameterSupport.
Selects an appropriate RouteStrategy (e.g., sharding‑key routing, mapping‑key routing, ES fallback).
Executes the chosen strategy, calculates the physical table location, and calls the middleware’s routing guide API.
Configuration Example (sharding.properties)
# Database prefix (for qdb configuration)
db.prefix=qmall_supply_
# Database index configuration
db.index.qmall.flight={dbIndex: 0}
db.index.qmall.inter={dbIndex: 1}
db.index.qmall.ticket={dbIndex: 2}
db.index.qmall.hermes={dbIndex: 3}
# Sharding key configuration
sharding.user_info=[{shardingKey: 'last_name',intervalMonth:2,hashCount:0,startTime: '2020-11-01'},{shardingKey: 'last_name',intervalMonth:2,hashCount:2,startTime: '2024-07-01'},{shardingKey: 'last_name',intervalMonth:2,hashCount:1,startTime: '2021-07-01'},{shardingKey: 'last_name',intervalMonth:1,hashCount:2,startTime: '2022-07-01'}]
sharding.supply_order=[{shardingKey: 'supply_order_id',intervalMonth:1,hashCount:2,startTime: '2022-11-01', hashGroupReg: '20[0-9]{2}(0[1-9]|1[0-2])[0-9]{6}'}]
sharding.supply_order_ext=[{shardingKey: 'supply_order_id',intervalMonth:1,hashCount:2,startTime: '2022-11-01'}]
# Date extraction regex (optional)
shardingKey.extract.date.pattern={supply_order_id: '20[0-9]{2}(0[1-9]|1[0-2])'}
# Mapping key configuration (higher priority wins)
table.supply_order=[{mapKey: 'business_order_id', type: 'one2many', priority: 1, maintain: 'auto_manual'}]
table.user_info=[{mapKey: 'id', type: 'one2one', priority: 1},{mapKey: 'phone', type: 'one2one', priority: 1, maintain: 'auto_manual'}]Access Layer Interfaces
DataSource & SessionFactory : wrappers that expose sharding‑aware connections and MyBatis SqlSessionFactory.
DAOTemplate & BaseMapper : utility classes for ad‑hoc SQL and common CRUD operations.
SqlRouteHelper : API to run code under a specific routing context (e.g., force a particular physical DB or table).
Transaction Interface : extends Spring’s transaction manager to coordinate the custom “distributed” transaction across two data sources.
Conclusion
The article presents two migration designs: an initial cautious approach using dual‑write and diff validation, and a later component‑based approach that abstracts the sharding logic for easier reuse and DDD‑oriented microservice development. While the solution is tailored to the author’s environment, the concepts of dual‑write, mapping‑key routing, and diff‑based validation are broadly applicable to any sharding migration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
