Databases 8 min read

How to Migrate Tens of Millions of Rows: Strategies and Practical Steps

This article explains common data‑migration strategies such as hash‑modulo, time‑range and ID‑range sharding, and compares practical migration solutions including downtime migration, dual‑write, and an MQ + Redis approach, with detailed workflow diagrams and implementation tips.

Lobster Programming

Dec 26, 2024

How to Migrate Tens of Millions of Rows: Strategies and Practical Steps

In internet services we often face tables with tens of millions of rows that must be split into multiple tables or moved to a different storage medium, requiring data migration; this article discusses various migration strategies.

1. Data Migration Strategies

Assume we have a massive order‑detail table that needs to be partitioned into sub‑tables. How should we decide which rows go to which sub‑table?

(1) Hash Modulo Method Choose a field such as order_id as the key, apply a hash function and modulo operation to determine the target sub‑table.

When selecting the key, consider the field most frequently queried in the business to minimize cross‑table queries; common keys include order_id, user_id, etc.

(2) Range‑Based Sharding First define an appropriate range, typically based on time or ID.

Time‑range sharding groups data of the same year into one table.

ID‑range sharding maintains an ID mapping table and assigns a specific ID interval to a particular sub‑table.

The choice of strategy depends on the specific business scenario, performance requirements, and maintainability considerations.

2. Data Migration Solutions

2.1 Downtime Migration This straightforward method stops service during migration, which impacts users; therefore it is usually performed at night after notifying users.

2.2 Dual‑Write Strategy Continue writing to the old table while simultaneously writing to the new sharded tables according to the sharding rule, recording the start time of synchronization.

Consistency is ensured by scheduled tasks that compare data between old and new tables; after a stable period, historical data is migrated to the new tables.

2.3 MQ + Redis Migration This smoother approach uses a message queue and Redis bitmaps to track migration progress.

(1) Design Migration Plan For tens of millions of rows, divide the ID space into groups (e.g., 10,000 IDs per group) and use Redis bitmaps in both source and target to record the migration status of each record. Example keys:

redis_source_key – bitmap for the source table

source_tb_s1e10 – IDs 1‑10000 in the source

redis_target_key – bitmap for the target table

target_tb_s1e10 – IDs 1‑10000 in the target

Additional keys record each group's start time (e.g., source_tb_s1e10_startTime) and status (e.g., source_tb_s1e10_status) for compensation.

(2) Execute Migration Split source data into groups, send each group to a message queue, and set the corresponding bitmap bit to 1 when the message is queued.

Consumers read the queue, write data to the target table, and set the target bitmap bit to 1 upon successful write.

(3) Data Compensation Periodically scan Redis for groups whose status is successful; XOR the source and target bitmaps to find mismatched bits, indicating records that were sent but not consumed. If a mismatch persists beyond a threshold (e.g., one hour), manual compensation is performed.

In summary:

Migration strategies include hash modulo, time‑range, and ID‑range sharding.

Common migration solutions are downtime migration, dual‑write, and MQ + Redis approaches.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Sharding Message Queue database scaling

Written by

Lobster Programming

Sharing insights on technical analysis and exchange, making life better through technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.