Evolution of Ctrip's MySQL Database Release System: From 1.0 to 3.0
This article details the design, evolution, and operational practices of Ctrip's MySQL database release system, covering its three major versions, the adoption of gh‑ost, safety mechanisms, real‑world case studies, and performance improvements that enable reliable, low‑impact schema changes at scale.
Internet software evolves rapidly, making reliable database schema release crucial; Ctrip's MySQL release system has progressed through three major versions to meet this need.
Environment Overview : Development, three test environments (FAT, LPT, UAT), and production. Changes are designed in Dev, validated through test stages, then released to production.
Version 1.0 (Early Stage) : Simple direct modifications—developers edit DEV tables, DBA tools sync changes to test and production. No consideration for impact or performance.
Version 2.0 (Transition) : MySQL adoption expands to 800+ instances; pt‑online‑schema‑change (pt‑osc) is introduced. While functional, pt‑osc uses triggers, adds overhead, and can stall large tables, increasing DBA workload.
Version 3.0 (gh‑ost Integration) : After extensive evaluation, gh‑ost is adopted, offering three core advantages:
Triggerless operation by reading MySQL binlog as a fake slave, avoiding trigger overhead.
Dynamic controllability via a socket server, allowing real‑time pause, resume, and parameter adjustments during a release.
Improved safety checks and automated pre‑release validation (environment variables, conflict tables, disk capacity, DRC member status, etc.).
gh‑ost works by creating a shadow table (_gho) and a changelog table (_ghc), running two goroutines to copy data and apply binlog increments in parallel. Insert statements are transformed to INSERT IGNORE for the copy phase and REPLACE for binlog apply, supporting concurrent execution.
Safety Mechanisms : Before release, the system validates MySQL variables, table conflicts, disk space, parallel releases, and DRC member health. During execution, listeners monitor disk usage, server load, replica lag, business peak times, DRC status, and cluster topology, throttling or aborting as needed.
Case Studies :
Preserving auto‑increment values using the -reset-original-auto-increment flag.
Handling unique key additions by rewriting row copy to conditional inserts and changing binlog apply from REPLACE to DELETE + INSERT , ensuring data integrity.
Combining large‑table release with data cleanup via -where-reserve-clause and optional -force-query-migration-range-values-on-master to optimize performance.
Performance comparison shows gh‑ost reduces a 290 GB table cleanup from 25 hours to 2.5 hours, a ten‑fold improvement.
Conclusion : Since 2018, the 3.0 release system has become stable, supporting rapid iteration, minimizing business impact, and covering most DDL types. Future work includes more user‑friendly interaction and smarter throttling.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.