How RedSQL Supercharged MySQL Performance and Achieved Zero‑Data‑Loss Replication
This article details Xiaohongshu's RedSQL MySQL kernel project, describing three major solutions—high‑throughput seckill optimization, a Binlog Server‑based zero‑data‑loss replication scheme, and second‑level DDL column addition—along with additional kernel enhancements that together delivered multi‑fold performance gains and improved stability.
Background
Over the past year, Xiaohongshu built a self‑developed MySQL kernel called RedSQL , moving from zero to a production‑ready version that now powers more than 80% of its core clusters. The rapid growth of traffic‑intensive scenarios such as flash‑sale (seckill), hot‑note updates, and live‑stream commerce exposed several pain points in the existing database stack.
Core Challenges
Inability to sustain short‑term high‑burst traffic in seckill and hot‑note use cases.
Zero‑RPO (Recovery Point Objective) requirements for finance‑related services.
Lengthy DDL operations (e.g., adding columns) that delay feature delivery.
Stability concerns during traffic spikes.
Solution 1 – Seckill Performance Boost
RedSQL introduced a merged‑seckill design that consolidates multiple transaction SQL statements into a single transaction, modifying the MySQL transaction, lock, and binlog subsystems. The key benefits are:
Ecosystem component transparency: Binlog format remains unchanged, so downstream tools like DTS/Canel require no upgrades.
Kernel upgrade transparency: No InnoDB format changes, preserving compatibility and rollback capability.
SQL‑level transparency: Existing seckill SQL works unchanged; the feature can be toggled on or off without DBA intervention.
Four technical improvements were made:
Cache visibility: A global cache resolves data visibility, while a queue‑based approach ensures multi‑thread consistency.
Row‑lock optimization: The seckill process is split into update and commit phases, with fine‑grained lock tweaks that enable parallel execution.
Parallel log commit: Multi‑threaded log submission accelerates throughput.
Crash‑recovery flow: Adjusted to guarantee data consistency after failures.
The result is a 5‑10× increase in write capacity compared with the open‑source MySQL baseline and a 5× improvement over the previous queue‑based seckill implementation, successfully supporting high‑concurrency live‑commerce scenarios.
Solution 2 – Data Consistency with Binlog Server
RedSQL builds a fully self‑developed Binlog Server paired with an ORC high‑availability component . The design achieves:
High‑speed replication: Only 1 CPU + 1 GB RAM is needed to sustain >300 MB/s data copy.
Zero‑manual‑intervention consistency: Automatic data补 (back‑fill) during primary‑standby switchover reduces RPO from 60 s to 0 s.
Key technical challenges addressed include full MySQL protocol support, a custom SQL parser for seamless integration, bidirectional node registration for master/slave roles, and half‑sync support that acknowledges ACKs from downstream nodes before committing.
The final architecture combines Binlog Server and ORC to provide a complete zero‑data‑loss solution, covering 100% of core clusters and meeting the RPO = 0 goal.
Solution 3 – Second‑Level Column Addition (DDL)
Traditional column addition in MySQL 5.7 relies on the gh‑ost tool, which performs a full table copy and can take days for large tables. RedSQL introduces a second‑level add‑column mechanism that decouples metadata from physical storage:
Metadata is updated instantly to reflect the new column count and default value.
Data rows are read with the original column set; the system appends the default value on‑the‑fly, eliminating the need for full table rebuild.
New rows carry the full column set, while old rows are flagged and lazily upgraded during reads.
This approach reduces DDL latency from days to seconds, enabling rapid schema evolution without service disruption.
Additional Kernel Features
RedSQL also adds several other capabilities:
CCL (Conditional Concurrency Limiting): Shifts from coarse‑grained time‑window throttling to precise SQL‑template based limiting, improving stability.
SQL syntax extensions: Supports SELECT … FROM UPDATE and RETURNING clauses, boosting TPS by ~20%.
BP (Buffer Pool) parallel loading: Increases load speed from 300 MB/s to 1 GB/s (3.3×), preventing warm‑up latency after restarts.
Benefits
The combined optimizations have enabled Xiaohongshu to:
Support ultra‑high‑concurrency flash‑sale events without invasive changes to application SQL.
Guarantee data consistency for critical finance‑related services with RPO = 0.
Accelerate schema changes from weeks to seconds, dramatically shortening feature delivery cycles.
Technical diagrams and performance charts (see images) illustrate the architecture and quantitative gains.
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
