Databases 9 min read

Deep Dive into Yugong: Architecture, Core Modules, and Custom Enhancements for Database Migration

This article introduces Yugong, an open‑source ETL framework for heterogeneous database migration, explains its core Extractor‑Translator‑Applier architecture, details key classes and interfaces, discusses limitations of the original version, and describes extensive refactoring and new features added to support SQL Server, MySQL, and Canal‑based incremental replication.

Hujiang Technology
Hujiang Technology
Hujiang Technology
Deep Dive into Yugong: Architecture, Core Modules, and Custom Enhancements for Database Migration

Yugong System Architecture

Yugong is a mature open‑source ETL tool developed by Alibaba's middleware team, designed for heterogeneous database migration. It works together with Otter and Canal, each serving different synchronization scenarios.

Extractor

Extractor reads data from the source database into memory. Core interfaces include YuGongLifeCycle, AbstractYuGongLifeCycle, RecordExtractor, AbstractRecordExtractor, and several Oracle‑specific implementations such as OracleOnceFullRecordExtractor, OracleFullRecordExtractor, OracleRecRecordExtractor, OracleMaterializedIncRecordExtractor, and OracleAllRecordExtractor. The abstract class AbstractRecordExtractor provides the base functionality.

Translator

Translator transforms the in‑memory rows. The main base class is DataTranslator with subclasses like TableTranslator, AbstractDataTranslator, EncodeDataTranslator, OracleIncreamentDataTranslator, and several demo translators (e.g., BackTableDataTranslator, BillOutDataTranslator, MidBillOutDetailDataTranslator).

Applier

Applier writes the transformed data to the target database. Important interfaces include RecordApplier and AbstractRecordApplier. Implementations cover consistency checking ( CheckRecordRecordApplier), full‑load upserts ( FullRecordRecordApplier), incremental loads using Oracle materialized views ( IncreamentRecordApplier), and an automated pipeline ( AllRecordRecordApplier).

Other Important Classes

Utility classes such as SqlTemplate, OracleSqlTemplate, RecordDiffer, YugongController, and YugongInstance support CRUD operations, consistency checks, and task orchestration.

Limitations of the Original Yugong

No support for SQL Server as source.

No support for SQL Server as target (rollback).

No native MySQL source support.

Additional engineering drawbacks include heavy Maven‑assembly packaging, INI‑based configuration, limited plugin architecture, and insufficient test separation.

Refactored Architecture

New abstract extractors were introduced to simplify future extensions:

AbstractSqlServerExtractor
AbstractMysqlExtractor
AbstractFullRecordExtractor
SqlServerCdcExtractor
MysqlCanalExtractor
MysqlCanalRedisExtractor
MysqlFullExtractor
SqlServerFullExtractor

These changes make the data flow clearer and enable easier addition of new database formats.

Translator extensions now include sharding and column‑fix translators such as Sha1ShardingTranslator, ModShardingTranslator, RangeShardingTranslator, UserRouterMapShardingTranslator, ColumnFixDataTranslator, NameStyleDataTranslator, and CompositeIndexesDataTranslator.

Applier was extended with SqlServerIncreamentRecordApplier to support incremental consumption from SQL Server.

Lessons from the Re‑engineering

Effective open‑source exploration starts with high‑level documentation before diving into source code. Tools like IntelliJ Diagram and custom plugins help visualize class relationships. Introducing CheckStyle, Google Java Style, and Sonar analysis improves code quality but may cause large diffs when refactoring legacy code.

Early consideration of upstream contribution, comprehensive unit testing, and consistent coding standards are essential to avoid integration friction.

The modified Yugong version is available at https://github.com/alswl/yugong , and a pull request has been submitted to the official repository.

Javaopen-sourceETLDatabase MigrationYugong
Hujiang Technology
Written by

Hujiang Technology

We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.