Deep Dive into Yugong: Architecture, Core Modules, and Custom Enhancements for Database Migration
This article introduces Yugong, an open‑source ETL framework for heterogeneous database migration, explains its core Extractor‑Translator‑Applier architecture, details key classes and interfaces, discusses limitations of the original version, and describes extensive refactoring and new features added to support SQL Server, MySQL, and Canal‑based incremental replication.
Yugong System Architecture
Yugong is a mature open‑source ETL tool developed by Alibaba's middleware team, designed for heterogeneous database migration. It works together with Otter and Canal, each serving different synchronization scenarios.
Extractor
Extractor reads data from the source database into memory. Core interfaces include YuGongLifeCycle, AbstractYuGongLifeCycle, RecordExtractor, AbstractRecordExtractor, and several Oracle‑specific implementations such as OracleOnceFullRecordExtractor, OracleFullRecordExtractor, OracleRecRecordExtractor, OracleMaterializedIncRecordExtractor, and OracleAllRecordExtractor. The abstract class AbstractRecordExtractor provides the base functionality.
Translator
Translator transforms the in‑memory rows. The main base class is DataTranslator with subclasses like TableTranslator, AbstractDataTranslator, EncodeDataTranslator, OracleIncreamentDataTranslator, and several demo translators (e.g., BackTableDataTranslator, BillOutDataTranslator, MidBillOutDetailDataTranslator).
Applier
Applier writes the transformed data to the target database. Important interfaces include RecordApplier and AbstractRecordApplier. Implementations cover consistency checking ( CheckRecordRecordApplier), full‑load upserts ( FullRecordRecordApplier), incremental loads using Oracle materialized views ( IncreamentRecordApplier), and an automated pipeline ( AllRecordRecordApplier).
Other Important Classes
Utility classes such as SqlTemplate, OracleSqlTemplate, RecordDiffer, YugongController, and YugongInstance support CRUD operations, consistency checks, and task orchestration.
Limitations of the Original Yugong
No support for SQL Server as source.
No support for SQL Server as target (rollback).
No native MySQL source support.
Additional engineering drawbacks include heavy Maven‑assembly packaging, INI‑based configuration, limited plugin architecture, and insufficient test separation.
Refactored Architecture
New abstract extractors were introduced to simplify future extensions:
AbstractSqlServerExtractor AbstractMysqlExtractor AbstractFullRecordExtractor SqlServerCdcExtractor MysqlCanalExtractor MysqlCanalRedisExtractor MysqlFullExtractor SqlServerFullExtractorThese changes make the data flow clearer and enable easier addition of new database formats.
Translator extensions now include sharding and column‑fix translators such as Sha1ShardingTranslator, ModShardingTranslator, RangeShardingTranslator, UserRouterMapShardingTranslator, ColumnFixDataTranslator, NameStyleDataTranslator, and CompositeIndexesDataTranslator.
Applier was extended with SqlServerIncreamentRecordApplier to support incremental consumption from SQL Server.
Lessons from the Re‑engineering
Effective open‑source exploration starts with high‑level documentation before diving into source code. Tools like IntelliJ Diagram and custom plugins help visualize class relationships. Introducing CheckStyle, Google Java Style, and Sonar analysis improves code quality but may cause large diffs when refactoring legacy code.
Early consideration of upstream contribution, comprehensive unit testing, and consistent coding standards are essential to avoid integration friction.
The modified Yugong version is available at https://github.com/alswl/yugong , and a pull request has been submitted to the official repository.
Hujiang Technology
We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
