Achieving RPO=0: How XiaoHongShu’s Binlog Server Boosts MySQL Replication Speed and Data Consistency
This article explains how XiaoHongShu’s database team built a lightweight Binlog Server to accelerate semi‑synchronous MySQL replication beyond 300 MB/s, achieve RPO=0 data‑loss‑free failover, and improve high‑availability without manual intervention, backed by performance tests and detailed architecture diagrams.
Background and Motivation
Data loss in critical scenarios severely impacts business availability; XiaoHongShu needed a solution that guarantees data consistency and reduces switchover time while eliminating manual intervention.
Binlog Server Solution
The database team introduced a Binlog Server‑based approach that enhances semi‑synchronous replication performance, accelerates log transmission, and automatically补 data during failures, achieving RPO=0.
With only 1C1G resources, replication speed exceeds 300 MB/s, effectively doubling performance.
Improves failover efficiency by using Binlog Server to补 data to the new master, lowering operational cost and business risk.
Deployed across all semi‑sync clusters, protecting core databases.
RPO=0 Concept
RPO=0 means that after any switch or disaster no data is lost; the new database automatically补 all missing data, eliminating manual repair and reducing operational pressure.
Industry Approaches
Three main OLTP solutions exist; the team selected Facebook’s Binlog Server design and re‑implemented it as a proprietary solution.
Design Goals
Support semi‑synchronous replication with RPO=0 and cascade architecture for data补 during master‑slave switch.
Provide higher replication speed while ensuring zero data loss.
Lightweight deployment (1C1G) with no intrusion to existing MySQL architecture.
Crash‑safe operation guaranteeing data consistency.
Compatibility with MySQL ecosystem tools.
Key Components
Binlog Server handles MySQL protocol, authentication, command processing, and event forwarding. It stores binlog files with index files, ensuring consistency via temporary files and crash‑safe commit/rollback mechanisms.
High Availability Integration
When the master fails, ORC selects a Binlog Server with the longest GTID as a temporary master to补 data, then promotes a same‑zone slave as the new master, ensuring zero data loss during failover.
Performance Validation
Deployed in a same‑city, different‑zone topology, Binlog Server achieves 300 MB/s write speed with minimal resources, enabling rapid recovery in zone‑level failures.
Future Applications
Beyond high‑availability, Binlog Server can be used for binlog补 in scaling, DTS, and as a low‑cost storage node using S3, extending its usefulness across various database scenarios.
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
