Backend Development 10 min read

Design and Architecture of a Unified MySQL Data Synchronization Platform

This article details the design of a unified MySQL data synchronization platform that consolidates offline sync, real‑time subscription, and real‑time sync into BatchJob, StreamJob, and PieJob abstractions, describing task implementations, cluster architecture, high‑availability mechanisms, and evolution challenges such as file loss and metadata handling.

JD Tech
JD Tech
JD Tech
Design and Architecture of a Unified MySQL Data Synchronization Platform

Background

Various business systems use MySQL for data storage, leading to multiple duplicated data‑sync tools that are hard to manage and have inconsistent performance. A unified platform is needed to provide offline sync, real‑time subscription, and real‑time sync services for MySQL data.

Architecture

1. Function Integration

Offline Sync: Extract data based on an SQL query and synchronize it to other target stores.

Real‑time Subscription: Parse MySQL binlog in real time, package data changes as events, and store them in a message queue for consumers.

Real‑time Sync: Provide common subscription client implementations that consume messages and apply changes to target stores.

2. How to integrate the three functions in one platform?

The three requirements are abstracted as three job types: BatchJob , StreamJob , and PieJob .

BatchJob follows a Sqoop‑like model: data to be synchronized is sharded according to rules, then each shard becomes a separate task that runs in parallel to improve efficiency.

StreamJob builds on the batch model; each task is responsible for reading and parsing the binlog of a specific MySQL instance, converting changes into messages stored locally for subscribers.

PieJob wraps subscription clients, treating each client as a task.

All three job types are executed as multiple parallel tasks using a unified model.

Task Details

BatchTask

After sharding, each BatchTask consists of a Fetcher (data extraction), a Sinker (data writing), and a Storage layer (caching).

StreamTask

The RelayLogTask pulls binlog data; the HHLTask parses the binlog, packages change events into protobuf messages, and writes them to an HHL file (a lightweight Kafka‑like queue with indexed blocks).

A ClientServer component handles client subscription requests. Upon receiving a request for a specific offset, it quickly locates the corresponding data block via an index, scans for the matching messages, filters them, and pushes the result to the client.

The server does not maintain client subscription state; clients track their own offsets, while the server continuously streams messages beyond the requested point.

PieTask

PieTask is a client‑side wrapper. The client uses a concurrent model where a connector receives messages, a partitioner distributes them to multiple processor threads, and a circular buffer records processed offsets to avoid blocking.

Cluster Design

The system uses a Master‑Slave (Queen‑Bee) architecture. The Queen handles job sharding and scheduling; Bees execute the actual tasks.

High Availability

MySQL HA is managed by DBAs; after a master‑slave switch, the system detects server‑ID changes to locate the correct binlog position.

The Queen achieves active‑standby failover via Zookeeper.

If a Bee crashes, the Queen reassigns its tasks to other Bees.

Data Locality

Each Bee reports its rack, cabinet, and data‑center information; jobs can specify locality preferences so tasks are preferentially placed on preferred machines.

Load Balancing

Bees periodically report load metrics; the Queen schedules new tasks to low‑load Bees while respecting data locality.

Evolution

1. HHL File Loss

When a host undergoes HA switching, locally stored HHL files may be lost. Two solutions were considered: replication (discarded due to high disk cost) and data replay, where missing messages are regenerated by re‑parsing binlog data.

2. Metadata

Binlog does not contain column names, so messages lack schema. The final approach captures a full snapshot of all MySQL tables at StreamJob startup; when DDL events occur, a new snapshot is created and the DDL applied, ensuring that any binlog point can be matched with appropriate metadata. A metadata service is also provided for clients to request schema on demand.

3. Resource Isolation

The first version used a distributed thread pool, causing resource contention when users uploaded custom JAR jobs. It was replaced by a process‑pool model, launching each job in a separate subprocess to achieve isolation.

4. Sub‑Cluster

When a Bee loses connection to the Queen (e.g., in a warehouse network), tasks stop. To mitigate this, Bees with the same group form a sub‑cluster, elect a master, and interact with the Queen through that master. The sub‑cluster’s master handles scheduling for its Bees, while the Queen only monitors overall job status. This design also removes the Zookeeper dependency.

Further details will be shared in upcoming articles.

backend architecturestream processingHigh AvailabilityBatch ProcessingMySQLdata synchronization
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.