Databases 36 min read

Seamless Sharding Migration: Dual‑Write, Mapping Keys, and Diff Validation

This article explains how to smoothly migrate a legacy single‑database system to a sharded architecture by using dual‑write, mapping‑key routing, custom transaction handling, and offline/real‑time diff checks, while detailing the underlying MyBatis integration and component‑based design.

ITPUB

Jun 23, 2023

Seamless Sharding Migration: Dual‑Write, Mapping Keys, and Diff Validation

Background

Sharding (splitting databases and tables) is a common optimization for large‑scale internet applications. Middleware such as Sharding‑JDBC or MyCAT can handle the routing, but the real challenge lies in upgrading an existing monolithic system to a sharded one without downtime.

Original Problems

Rollback during migration can cause data inconsistency when new data is written to the new shard before the rollback.

SQL statements in a mature system often lack the required sharding key, making full‑scale rewrite impractical.

Ensuring business‑level data equivalence between the pre‑sharding and post‑sharding systems.

First Smooth Migration Practice

Implement dual‑write : write to both the original single‑table database and the new sharded database, using a mapping‑key concept to route reads/writes.

Use a special transaction to keep the two systems consistent.

Apply an iff check to verify that data in both databases are equivalent.

Prerequisite Knowledge – MyBatis Basics

MyBatis provides three layers:

Interface layer : the SqlSession API used by developers; Spring injects dynamic proxy implementations of mapper interfaces.

Data‑processing layer : handles parameter processing, SQL parsing, execution, and result construction.

Foundation layer : manages connections, transactions, configuration loading, and caching.

MyBatis‑Spring Execution Flow

SQL execution starts from SqlSession, which delegates to an Executor. Executor builds a JDBC Statement via StatementHandler, handling caching and parameter conversion. StatementHandler wraps the JDBC Statement and uses ParameterHandler, ResultSetHandler, and TypeHandler for parameter binding, result conversion, and type mapping.

Dual‑Write Implementation

During migration the original single‑table database remains active while a new sharded database is created. Writes are performed on both databases; reads can be directed to either side once equivalence is confirmed. The dual‑write is achieved via a MyBatis plugin that intercepts SQL, swaps the SqlSession, and rewrites the statement for the secondary data source.

Key technical challenges:

Switching the target data source inside MyBatis while preserving Spring transaction semantics (setting autoCommit to false in the secondary connection and managing reference counts).

Duplicating the JDBC Statement and its parameters for the secondary execution, ensuring isolation between parent and child contexts to avoid ThreadLocal pollution.

Mapping‑Key Routing

When a query lacks the sharding key, a mapping key is used. The mapping key is stored in a separate mapping table (also sharded). At runtime the plugin looks up the mapping key, derives the actual sharding key, and computes the physical table location, avoiding full‑table broadcasts.

Example: a query by couponId (mapping key) is mapped to orderId (sharding key) via a couponId → orderId entry, allowing the query to be routed correctly.

Diff and Transaction Validation

To prove equivalence, the system performs both offline and real‑time diffs:

Offline diff compares full data sets from both databases for a given day.

Real‑time diff checks the result of each write operation; if write diffs match, read diffs are skipped for performance.

When diffs converge to zero within 24 hours, the databases are considered equivalent. Transaction handling uses a custom “distributed” transaction: the primary Spring transaction manages the main data source, while a secondary connection is wrapped in a Spring TransactionSynchronization to commit or roll back together.

New Issues and Componentization

After the first migration, several pain points emerged: testing difficulty, high maintenance cost due to extensive annotations/configuration, and limited reusability. To address these, the solution was refactored into a reusable component with three layers:

Access layer : stable, backward‑compatible APIs (Spring starter, annotations).

Core layer : routing logic, lifecycle management, and plugin points.

Storage layer : actual SQL execution via data sources and MyBatis.

Core Implementation Details

The entry point is SqlRouterInterceptor (a MyBatis plugin). It delegates to SqlRouteProcessor, which:

Parses the SQL into a SqlInfo structure (contains tables, columns, conditions, type).

Fills parameters into SqlInfo via a proxy PreparedParameterSupport.

Selects an appropriate RouteStrategy (e.g., sharding‑key routing, mapping‑key routing, ES fallback).

Executes the chosen strategy, calculates the physical table location, and calls the middleware’s routing guide API.

Configuration Example (sharding.properties)

# Database prefix (for qdb configuration)

db.prefix=qmall_supply_

# Database index configuration

db.index.qmall.flight={dbIndex: 0}

db.index.qmall.inter={dbIndex: 1}

db.index.qmall.ticket={dbIndex: 2}

db.index.qmall.hermes={dbIndex: 3}

# Sharding key configuration
sharding.user_info=[{shardingKey: 'last_name',intervalMonth:2,hashCount:0,startTime: '2020-11-01'},{shardingKey: 'last_name',intervalMonth:2,hashCount:2,startTime: '2024-07-01'},{shardingKey: 'last_name',intervalMonth:2,hashCount:1,startTime: '2021-07-01'},{shardingKey: 'last_name',intervalMonth:1,hashCount:2,startTime: '2022-07-01'}]
sharding.supply_order=[{shardingKey: 'supply_order_id',intervalMonth:1,hashCount:2,startTime: '2022-11-01', hashGroupReg: '20[0-9]{2}(0[1-9]|1[0-2])[0-9]{6}'}]
sharding.supply_order_ext=[{shardingKey: 'supply_order_id',intervalMonth:1,hashCount:2,startTime: '2022-11-01'}]

# Date extraction regex (optional)
shardingKey.extract.date.pattern={supply_order_id: '20[0-9]{2}(0[1-9]|1[0-2])'}

# Mapping key configuration (higher priority wins)
 table.supply_order=[{mapKey: 'business_order_id', type: 'one2many', priority: 1, maintain: 'auto_manual'}]
 table.user_info=[{mapKey: 'id', type: 'one2one', priority: 1},{mapKey: 'phone', type: 'one2one', priority: 1, maintain: 'auto_manual'}]

Access Layer Interfaces

DataSource & SessionFactory : wrappers that expose sharding‑aware connections and MyBatis SqlSessionFactory.

DAOTemplate & BaseMapper : utility classes for ad‑hoc SQL and common CRUD operations.

SqlRouteHelper : API to run code under a specific routing context (e.g., force a particular physical DB or table).

Transaction Interface : extends Spring’s transaction manager to coordinate the custom “distributed” transaction across two data sources.

Conclusion

The article presents two migration designs: an initial cautious approach using dual‑write and diff validation, and a later component‑based approach that abstracts the sharding logic for easier reuse and DDD‑oriented microservice development. While the solution is tailored to the author’s environment, the concepts of dual‑write, mapping‑key routing, and diff‑based validation are broadly applicable to any sharding migration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sharding MyBatis database partitioning diff-validation mapping key distributed-transaction dual-write

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.