Comprehensive Guide to Application and Data Splitting Migration for High‑Volume C‑End Services
This article outlines a two‑stage migration strategy—application splitting and data splitting—detailing how to isolate APIs, message queues, and scheduled tasks, manage dual‑write database synchronization, ensure gray‑scale traffic routing, and perform validation to achieve a reliable, low‑risk transition for large‑scale backend systems.
Application Splitting
The goal is to extract business code into independent services and route all traffic to the new services. The process starts with a thorough business analysis, dividing the migration into three parts: API, message queue, and scheduled tasks.
API
APIs are the main entry points and can be external (frontend, H5, mini‑programs, apps) or internal (service‑to‑service). External APIs are usually exposed via a unified gateway, allowing seamless traffic migration without client changes. Internal HTTP APIs can use the same gateway; RPC APIs require custom solutions.
From a business perspective, APIs are classified as independent (used only by the split service) or coupled (shared with other services). Independent APIs migrate easily via gateway gray‑scale routing, while coupled APIs may need custom routing logic or an additional proxy layer.
Message Queue
During service splitting, producers remain unchanged, but consumers must be migrated. The article uses a user‑service and points‑system example where both services subscribe to the same order topic, highlighting the need for idempotent processing and optional distributed locks to avoid duplicate handling.
Data Splitting
After application splitting, services still share a single database, which must be isolated. The migration starts with inventorying databases and tables, assessing data volume, and categorizing tables by read/write frequency and business priority.
Typical databases involved are MySQL and MongoDB, with large tables exceeding 2 million rows. Migration tools include Canal for MySQL, cloud DTS services, and MongoShake for MongoDB.
1. Dual‑Write
Data is written to both old and new databases. Two approaches are described:
Synchronous Dual‑Write
Pros: Keeps data consistent across both databases.
Cons: Impacts performance and may block business flow if the new write fails.
Asynchronous Dual‑Write
Pros: No impact on the primary workflow.
Cons: Introduces latency, potential data loss, and consistency risks, especially for ordered operations like orders.
Order‑preserving queues or sharding keys can mitigate ordering issues.
2. Existing Data Synchronization
Once dual‑write is active, incremental data is aligned, allowing full‑load migration of historic data using tools such as Canal, DTS, or MongoShake. Care must be taken to avoid ID collisions between old and new records.
3. Data Verification
Verification compares new‑database records against the old database, typically via scheduled jobs during low‑traffic periods, to detect any inconsistencies introduced by dual‑write.
4. Traffic Switch
Read traffic is gradually shifted to the new database using a routing layer that selects the data source based on gray‑scale rules (e.g., modulo of memberId or orderNo). The snippet below shows a simple modulus calculation:
rate = shade_key mod 100Frameworks like sharding‑jdbc support multi‑datasource routing and custom rules.
Summary
The migration plan consists of two phases—application migration to isolate services and data migration to separate databases—requiring careful planning, dual‑write implementation, data sync, verification, and gray‑scale traffic routing to ensure a smooth, low‑risk transition.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.