How GitHub Upgraded 1,200 MySQL Servers from 5.7 to 8.0 Without Downtime
GitHub upgraded over 1,200 MySQL hosts from 5.7 to 8.0 across more than 50 clusters, detailing the motivation, infrastructure, extensive preparation, a five‑step rolling upgrade process, rollback strategies, challenges like Vitess integration and replication lag, and key lessons for future database migrations.
Upgrade Motivation
GitHub needed to move from MySQL 5.7 to 8.0 because 5.7 was reaching end‑of‑life, and the newer version offers security patches, bug fixes, performance improvements, and features such as online DDL, hidden indexes, and compressed binlogs.
GitHub MySQL Infrastructure
The platform runs more than 1,200 MySQL hosts on a mix of Azure VMs and bare‑metal servers, storing over 300 TB of data and handling 5.5 million queries per second across 50+ clusters. Each cluster is highly available with primary‑replica pairs, uses both horizontal and vertical sharding, and relies on a rich tooling ecosystem (Percona Toolkit, gh‑ost, Orchestrator, Freno, internal automation).
Preparation Work
Key requirements were to upgrade every MySQL database without violating SLO/SLA, retain the ability to roll back to 5.7, and support mixed‑version environments for an extended upgrade period. Milestones began in July 2022, including benchmarking default settings, extending CI to run both 5.7 and 8.0, and preparing MySQL‑8.0 containers for Codespaces.
Upgrade Plan
The team executed a staged, checkpoint‑driven rollout:
Step 1 – Rolling replica upgrade : Upgrade a single replica, monitor stability, then gradually shift read traffic to 8.0 replicas while keeping enough 5.7 replicas for rollback.
Step 2 – Replication topology change : Promote the upgraded replica to primary‑candidate, create two downstream chains – one of standby 5.7 replicas and one of active 8.0 replicas – for a short window.
Step 3 – Promote MySQL 8.0 to primary : Use Orchestrator to perform a graceful failover, making the 8.0 replica the new primary and black‑listing the old 5.7 primary to prevent accidental failback.
Step 4 – Internal instance upgrade : Upgrade non‑production and backup MySQL instances to keep the environment consistent.
Step 5 – Cleanup : After a full 24‑hour traffic validation, decommission all 5.7 instances.
Rollback Capability
Maintaining the ability to revert to 5.7 required keeping enough 5.7 read replicas online and ensuring compatibility for replication from an 8.0 primary back to 5.7. Challenges included default character set changes (utf8mb4 0900 ai_ci vs. utf8mb4_unicode_520_ci) and the introduction of role‑based privileges in 8.0, which were mitigated by temporary permission adjustments.
Challenges Encountered
Vitess integration : Upgrading Vitess shards required updating VTgate to announce the new MySQL version, as some client libraries behaved differently (e.g., query cache removal in 8.0).
Replication lag : A known MySQL bug fixed in 8.0.28 caused replication errors; the team ensured all clusters ran a version newer than this patch and tuned Freno throttling to mitigate increased write‑induced lag.
Production query failures : Certain large WHERE IN queries that passed CI crashed MySQL in production; query sampling and rewriting were used before proceeding with further upgrades.
Experience and Lessons Learned
The year‑long effort highlighted the importance of observability, automated testing, and a robust rollback plan. Incremental upgrades allowed early detection of issues, and sharding the data reduced risk by limiting the impact of a single failing query. Investing in tooling, such as Orchestrator and Freno, proved essential for scaling the operation.
Future upgrades will benefit from the automation and self‑healing capabilities built during this project, reducing manual steps and shortening migration windows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
