How Vivo Leverages Alibaba Canal for Zero‑Downtime Data Migration and HA
This article explains how Vivo uses Alibaba's open‑source Canal to capture MySQL binlog changes, achieve non‑stop sharding and cross‑region data migrations, ensure high availability with Zookeeper, and share practical lessons on serialization, consistency, and monitoring in large‑scale backend systems.
Introduction
Large single‑table data volumes (hundreds of millions of rows) require sharding, data migration and stable service performance. Canal, an Alibaba open‑source project, is used to capture MySQL binlog changes for incremental data subscription and consumption.
Canal Overview
Canal parses MySQL binary logs (binlog) by acting as a simulated MySQL slave. It sends a dump request to the master, receives the binlog stream, parses it into a protobuf‑defined structure and pushes the parsed events to downstream consumers.
MySQL Replication Basics
Canal depends on MySQL master‑slave replication. The master writes data changes to the binary log; the slave copies these events to its relay log and replays them locally.
Architecture
Key components:
Server : a JVM process that hosts one or more Canal instances.
Instance : a logical data queue.
Inside an instance:
EventParser : implements the MySQL slave protocol, receives and parses binlog events, records the current binlog position and forwards the data to EventSink.
EventSink : filters, aggregates, transforms and routes the parsed rows.
EventStore : persists parsed binlog objects and manages offsets for consumption acknowledgment.
MetaManager : maintains subscription metadata similar to a message‑queue broker.
Data Format
Canal wraps each binlog event into a protobuf message defined in EntryProtocol.proto. The main fields are:
Entry
Header
logfileName // binlog file name
logfileOffset // position in the binlog
executeTime // event timestamp
schemaName // database name
tableName // table name
eventType // INSERT / UPDATE / DELETE
entryType // BEGIN / END / ROWDATA
storeValue // serialized RowChange
RowChange
isDdl // true for DDL statements
sql // DDL SQL text
rowDatas // list of row changes
beforeColumns // Column[] before the change
afterColumns // Column[] after the change
Column
index
sqlType
name
isKey
updated
isNull
valueHigh Availability
Canal uses Zookeeper to elect a single active server instance and a single active client per instance. EPHEMERAL nodes and watcher mechanisms ensure that when the active server disappears, another server takes over, and clients reconnect to the new leader.
Typical Use Cases
Zero‑downtime migration : Incrementally sync source and target databases during sharding or region migration.
Cache refresh : Trigger asynchronous cache updates when underlying tables change.
Task dispatch : Convert row changes into MQ/Kafka messages for downstream processing.
Data heterogeneity : Aggregate data from multiple sharded tables into a unified view for complex queries.
Vivo Account Practical Cases
Case 1 – Sharding Migration
Problem: An account table exceeded 300 million rows, making full‑table migration costly and requiring downtime.
Solution: Use Canal for incremental change capture while migrating data in three phases – switch, dual‑write, and sharding.
Key steps:
Analyze pain points (large table, many unique user IDs, poor business partitioning).
Design a sharding scheme (e.g., hash‑based or range‑based on user ID).
Perform full‑data migration with traditional scheduled jobs, then enable Canal to sync incremental changes.
Transition from single‑write → dual‑write → sharding mode, monitoring for issues.
Result: After two weeks of dual‑write, the system switched to sharding with no major incidents; minor issues were resolved quickly.
Case 2 – Cross‑Region Migration
Problem: GDPR compliance required moving Australian user data from a Singapore data center to an EU data center without service interruption.
Solution: Deploy a standby replica in Singapore, use Canal to capture binlog, encrypt the changes, transmit them to the EU region, then switch DNS after verification.
Steps:
Build a standby MySQL instance in Singapore and enable binlog replication.
Deploy Canal server and client to consume the binlog.
Parse and encrypt each change before sending it to the EU GDPR‑compliant zone.
Store the data in the EU MySQL instance and verify stability.
Redirect traffic by updating DNS to point to the EU instance.
Stop Canal services and clean up the Singapore data.
Lessons Learned
Data serialization : Canal uses protobuf; null values are converted to empty strings, which can cause mismatches in ORM updates.
Data consistency : A single‑node Canal client may process out‑of‑order updates, leading to overwrites (e.g., phone‑number change race).
Master‑slave lag : High write rates increase replica lag; applying rate‑limiting based on business load mitigates the issue.
Monitoring : Simple in‑memory counters were added to detect anomalies, but coarse granularity missed some problems, highlighting the need for richer metrics.
References
Official Canal repository: https://github.com/alibaba/canal
Related Otter project: https://github.com/alibaba/otter
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
