Databases 16 min read

How GitHub Upgraded 1,200 MySQL Servers to 8.0 Without Downtime

GitHub detailed a year‑long, multi‑team effort to upgrade over 1,200 MySQL hosts from 5.7 to 8.0 using phased rollouts, automated testing, compatibility checks, and rollback mechanisms while maintaining strict SLOs and high‑availability requirements.

FunTester
FunTester
FunTester
How GitHub Upgraded 1,200 MySQL Servers to 8.0 Without Downtime

Upgrade Motivation

GitHub needed to move from MySQL 5.7, whose lifecycle was ending, to MySQL 8.0 to obtain security patches, bug fixes, performance improvements, and new features such as Instant DDL, invisible indexes, and compressed binary logs.

GitHub's MySQL Infrastructure

The fleet consists of more than 1,200 hosts running on a mix of Azure VMs and bare‑metal servers, organized into over 50 clusters that store more than 300 TB of data and handle roughly 5.5 million queries per second. Each cluster is configured for high availability with primary‑replica pairs, and data is sharded both horizontally and vertically using Vitess for large domains.

Extensive tooling ecosystem: Percona Toolkit, gh‑ost, Orchestrator, Freno, and internal automation.

Preparation Requirements

Because MySQL is GitHub’s primary data store, the upgrade had to meet three strict criteria: (1) upgrade each database without violating SLO/SLA, (2) retain the ability to roll back to 5.7 without service interruption, and (3) support a mixed‑version environment for the duration of the rollout.

Preparing Infrastructure for Upgrade

The team defined appropriate MySQL 8.0 defaults, ran benchmark tests, and ensured tooling could handle both 5.7 and 8.0 syntax and deprecations.

Ensuring Application Compatibility

All MySQL‑using applications were added to continuous‑integration pipelines that run MySQL 5.7 and 8.0 side‑by‑side. Compatibility errors and reserved‑keyword issues were caught early, and developers were given access to a MySQL 8.0 pre‑built container in GitHub Codespaces for debugging.

Communication and Transparency

A rolling calendar in GitHub Projects tracked the upgrade plan, and issue templates served as checklists for application and database teams.

Upgrade Plan

Step 1: Rolling Replica Upgrade

Upgrade a single replica, monitor stability, then gradually promote the upgraded replica to serve read traffic. Continue until all replicas in a data center run 8.0, keeping enough 5.7 replicas online for potential rollback.

Step 2: Update Replication Topology

Configure an 8.0 candidate as a replica of the current 5.7 primary.

Create two downstream chains: one with 5.7 replicas (standby) and one with 8.0 replicas (serving).

Maintain this topology only briefly before moving to the next step.

Step 3: Promote an 8.0 Replica to Primary

Use Orchestrator to perform a graceful failover, promoting an 8.0 replica to primary while keeping a 5.7 replica ready for rollback. Replastrator was configured to blacklist the 5.7 primary from automatic failover.

Step 4: Upgrade Internal Instance Types

After confirming stability, decommission the remaining 5.7 servers and validate the cluster over a full 24‑hour traffic cycle.

Rollback Capability

The strategy required keeping enough 5.7 replicas online to serve production reads and maintaining bidirectional replication to allow a safe fallback. MySQL supports forward replication but not reverse replication, so special handling was needed for the primary‑to‑primary upgrade test.

Character set default changed from utf8mb4 with utf8mb4_0900_ai_ci (MySQL 8.0) to utf8 with utf8_unicode_ci to maintain compatibility with 5.7.

MySQL 8.0 introduced role‑based privileges, which broke downstream replication when promoted; the team temporarily adjusted user permissions during the upgrade window.

Challenges

Vitess Handling

Vitess sharding required updating VTgate to advertise the new MySQL version, otherwise client libraries that disabled query cache on 5.7 would fail on 8.0.

Replication Lag Bug

A bug in MySQL 8.0.28 related to replica_preserve_commit_order caused commit‑order tickets to exhaust under heavy load, hanging applier threads. The issue was fixed in 8.0.28, so the upgrade required a version newer than that.

Replication: If a replica server with the system variable `replica_preserve_commit_order` = 1 set was used under intensive load for a long period, the instance could run out of commit order sequence tickets. ... The commit order sequence ticket generator now wraps around correctly.

Test Pass, Production Fail

Despite passing CI, a production query with a massive WHERE IN clause caused MySQL to crash. The team rewrote such queries after sampling and used Solarwinds (VividCortex) for query observability.

Learnings and Takeaways

The year‑long effort highlighted the importance of observability, thorough testing, and robust rollback mechanisms. Maintaining reverse replication from 8.0 to 5.7 proved the most challenging, especially with diverse client libraries. Partitioning data early allowed staged upgrades and limited blast‑radius of failures.

GitHub’s MySQL fleet grew from five clusters in the previous upgrade to over fifty, necessitating investments in tooling, automation, and processes.

Conclusion

MySQL upgrades are routine yet complex maintenance tasks. GitHub built new processes and automation to reduce manual steps for future upgrades, aiming for faster, safer migrations as the platform continues to scale.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Operationshigh availabilitymysqlGitHubdatabase migrationupgrade strategy
FunTester
Written by

FunTester

10k followers, 1k articles | completely useless

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.