Mastering Data Migration and Distributed Transactions: From Full Load to XA and BASE
This article explains how to perform full, incremental, and binlog‑based data migrations, compares their trade‑offs, and introduces distributed transaction models such as XA, BASE, TCC, and AT, helping developers choose the right strategy for consistency and performance.
Introduction
After vertically or horizontally splitting data to solve capacity and performance issues, migration and consistency become new challenges. Migration must be fast, smooth, and ideally non‑downtime, while consistency may require distributed transactions.
Data Migration
Data migration is prone to failures and requires careful planning to minimize downtime, ensure accuracy, and handle heterogeneous data structures.
Full Migration
Stop the business system.
Migrate the database and verify data consistency.
Upgrade the business system and connect to the new database.
Drawbacks: Requires system downtime and long migration time, especially for heterogeneous data.
Full + Incremental Migration
Synchronize data up to a recent timestamp (creation time).
Notify stakeholders of the upcoming system upgrade.
Synchronize the data changes that occurred after the timestamp.
Upgrade the system and switch to the new database.
This approach reduces downtime compared to full migration.
Binlog + Full + Incremental
Uses binlog parsing from the primary or replica to reconstruct data, enabling multi‑threaded, resumable, and automatic scaling synchronization. Common tools include Canal and ShardingSphere‑scaling.
Distributed Transactions
XA Transactions
XA is a protocol supported by the database itself, providing strong consistency.
Components:
Application Program (AP): defines transaction boundaries.
Resource Manager (RM): databases, file systems, etc.
Transaction Manager (TM): assigns transaction IDs, monitors progress, and handles commit/rollback.
Key XA commands:
xa_start – start or resume a transaction branch.
xa_end – detach the current thread from the branch.
xa_prepare – ask RM if it can commit.
xa_commit – instruct RM to commit.
xa_rollback – instruct RM to roll back.
xa_recover – recover a failed XA transaction.
MySQL supports XA transactions for InnoDB since 5.0.3.
Typical XA frameworks: Atomikos, Narayana, Seata.
Problems with XA:
Synchronization blocking and performance loss when high isolation is required.
Single point of failure in the TM.
Potential data inconsistency if network failures occur during the two‑phase commit.
Soft (BASE) Transactions
BASE sacrifices strong consistency for availability, using soft state and eventual consistency.
Basically Available – participants may not be online simultaneously.
Soft State – system state can be stale temporarily.
Eventual Consistency – achieved via asynchronous messaging.
The core idea is to move lock handling from the RM layer to the business layer, improving throughput.
TCC (Try‑Confirm‑Cancel)
TCC splits each service operation into two phases: Try reserves resources, Confirm executes the business logic, and Cancel releases resources. All phases must be idempotent.
Allow empty rollback when Try fails.
Ensure Cancel runs after Try.
Design idempotent Confirm and Cancel to handle network issues.
AT (Automatic Transaction)
AT is a two‑phase commit that automatically generates reverse SQL for rollback on failure.
Seata implements AT with a fast asynchronous second phase and uses local rollback logs for compensation.
Summary
Distributed transactions aim to solve data consistency. XA offers strong consistency but low throughput, unsuitable for high‑concurrency scenarios. Soft transactions (BASE) do not guarantee strong consistency but achieve eventual consistency through compensation mechanisms such as retries, scheduling, or manual intervention.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
