Databases 14 min read

Online Database Migration from SQL Server to MySQL Using CDC and Yugong

This article explains how to achieve near‑zero‑downtime online migration from SQL Server to MySQL by leveraging Change Data Capture (CDC) and the Alibaba Yugong tool, covering CDC principles, configuration, performance tuning, and practical migration steps.

Hujiang Technology
Hujiang Technology
Hujiang Technology
Online Database Migration from SQL Server to MySQL Using CDC and Yugong

Online Migration Principles and Process

To avoid the long downtime caused by full‑load migration, we explore an online migration approach that keeps the source system running with only a minimal pause. The solution is inspired by Alibaba's yugong project, which already provides an incremental migration workflow for Oracle‑to‑MySQL.

The Yugong incremental migration consists of four steps:

Collect incremental data by creating materialized views on the source database.

Perform a full data copy.

Execute incremental copy (optionally with parallel data validation).

Stop writes on the source and switch to the target.

Materialized views in Oracle act as snapshots that are refreshed either completely or incrementally. The refresh methods (Complete, Fast, Force) and refresh modes (Refresh‑on‑commit, Refresh‑on‑demand) determine how quickly changes are captured.

CDC Working Principle

Change Data Capture (CDC) in SQL Server 2008 captures row‑level changes from the transaction log and writes them into a change table. Users can query this table via built‑in CDC functions to retrieve the captured changes.

CDC Data Structure and Basic Usage

The change table (e.g., cdc.dbo_fruits_CT) contains metadata columns prefixed with __$ (such as __$start_lsn, __$operation) and the original table columns (e.g., id, name). The __$operation column encodes the type of DML operation (1 = delete, 2 = insert, 3 = update‑before, 4 = update‑after).

-- enable cdc for db
sys.sp_cdc_enable_db;

-- enable cdc for table
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'fruits', @role_name = NULL;

-- list cdc enabled tables
SELECT name, is_cdc_enabled FROM sys.databases WHERE is_cdc_enabled = 1;

To query changes within a time window, we map the timestamps to Log Sequence Numbers (LSNs) and call the function fn_cdc_get_all_changes_dbo_fruits:

DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10);
SET @begin_time = '2017-09-11 14:03:00.000';
SET @end_time   = '2017-09-11 14:10:00.000';
SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);
SELECT @end_lsn   = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_fruits(@begin_lsn, @end_lsn, 'all');

The result shows rows with their LSNs, operation codes, and the original column values, allowing us to reconstruct the exact sequence of inserts, updates, and deletes that occurred during the interval.

CDC Tuning

Key CDC job parameters that balance resource consumption, throughput, and latency are:

maxscans : maximum number of log scans per job (default 10).

maxtrans : maximum number of transactions processed per scan (default 500).

pollinginterval : pause between scans in seconds (default 5).

Adjusting these values helps handle large rows, BLOB columns, or high‑volume workloads.

CDC Performance Testing

Performance metrics to monitor include memory (buffer‑cache‑hit, page‑life‑expectancy), throughput (batch‑requests, sql‑compilations, transaction count), resource usage (user‑connections, lock‑waits, checkpoint‑pages), and OS‑level CPU and disk I/O.

In a benchmark inserting one million random rows with 30 concurrent threads, CDC remained stable up to ~16k TPS, with only modest process‑block spikes during enable/disable operations.

Optimization tips:

Tune maxscans, maxtrans, and pollinginterval.

Avoid immediate successive inserts.

Limit large‑batch writes.

Record only necessary columns.

Disable net‑changes if not needed.

Run cleanup jobs during low load.

Monitor log file size and I/O to prevent disk saturation.

Specify a filegroup for CDC tables and set it before enabling CDC on a table.

Yugong Online Migration Mechanism

Yugong abstracts source, target, and SQL template interfaces. By implementing the source interface for SQL Server (using CDC) and the target interface for MySQL, we can stream captured changes directly into MySQL.

Rollback Strategy

Bidirectional migration requires feeding MySQL binlog changes back to SQL Server. MySQL binlog offers Statement, Row, and Mixed modes; only Row‑based logging guarantees idempotent replay. Alibaba's Canal component can read MySQL binlog, push events into a Redis queue, and allow replay to SQL Server when needed.

Best Practices

Database migration is a high‑risk operation. The recommended practice includes:

Preparing resources and a clear release process.

Defining failure‑handling and rollback procedures.

Using CDC for near‑real‑time incremental sync.

Leveraging Yugong for orchestrating full and incremental loads.

Monitoring CDC health and performance continuously.

These steps have been codified into a repeatable workflow for multiple business units.

References

Materialized View Concepts and Architecture – Oracle Documentation.

Tuning the Performance of Change Data Capture in SQL Server 2008 – Microsoft Docs.

Alibaba/Yugong – Open‑source data migration tool (full + incremental).

Alibaba/Canal – MySQL binlog incremental subscription and consumption component.

MySQLDatabase MigrationCDCSQL ServerChange Data CaptureYugong
Hujiang Technology
Written by

Hujiang Technology

We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.