Databases 16 min read

How GitHub Upgraded 1,200 MySQL Servers from 5.7 to 8.0 Without Downtime

GitHub detailed a year‑long, multi‑team effort to upgrade over 1,200 MySQL hosts from version 5.7 to 8.0, describing the motivations, infrastructure scale, preparation steps, a staged rollout plan, rollback strategies, challenges faced, and key lessons learned for large‑scale database migrations.

dbaplus Community

Dec 19, 2023

How GitHub Upgraded 1,200 MySQL Servers from 5.7 to 8.0 Without Downtime

Motivation

GitHub upgraded its MySQL fleet from 5.7 to 8.0 because 5.7 was approaching end‑of‑life and MySQL 8.0 provides security patches, bug fixes, performance improvements, and new features such as online DDL, hidden indexes, and compressed binary logs.

Scale of the MySQL Deployment

~1,200 hosts (Azure VMs and bare‑metal) across multiple data centers.

>300 TB of data stored in more than 50 database clusters, handling ~5.5 million queries per second.

Each cluster runs a primary‑replica HA pair.

Data is sharded both horizontally (Vitess) and vertically to isolate product domains.

Tooling ecosystem includes Percona Toolkit, gh‑ost, Orchestrator, Freno, and internal automation for cluster operations.

Preparation and Compatibility Checks

Define MySQL 8.0 default configuration (e.g., character_set_server=utf8, collation_server=utf8_unicode_ci) to remain compatible with existing 5.7 replicas.

Run benchmark suites on a representative subset of clusters to validate performance and identify version‑specific regressions.

Extend CI pipelines to start both MySQL 5.7 and 8.0 containers in parallel; detect deprecations (e.g., removed query cache) and reserved keywords.

Provide developers with a pre‑built MySQL 8.0 container for local testing and a dedicated pre‑production MySQL 8.0 cluster.

Upgrade Strategy

Step 1 – Rolling upgrade of read‑only replicas

For each cluster, take a single replica offline, upgrade it to 8.0, and run basic health checks (replication lag, query latency, system metrics). Once stable, route read traffic to the upgraded replica. Repeat until all replicas in a data center run 8.0, while keeping a sufficient number of 5.7 replicas as a rollback pool.

Step 2 – Re‑configure replication topology

After all read traffic is served by 8.0 replicas, promote an 8.0 replica to act as a new primary candidate that replicates from the existing 5.7 primary. Create two downstream chains:

A standby chain of 5.7 replicas (offline, ready for rollback).

An active chain of 8.0 replicas (serving traffic).

Step 3 – Graceful failover to MySQL 8.0 primary

Use Orchestrator to perform a controlled failover, promoting the 8.0 replica to primary. The final topology consists of one 8.0 primary, an offline 5.7 rollback chain, and an online 8.0 replica chain. Orchestrator also blacklists the old 5.7 primary to prevent accidental failback.

Step 4 – Upgrade non‑production and backup instances

After the primary clusters are stable on 8.0, upgrade all backup, staging, and internal tooling instances to keep the environment consistent.

Step 5 – Cleanup

Run a full 24‑hour production traffic validation. Once no regression is observed, decommission the remaining 5.7 instances.

Rollback Capability

The plan retains a full rollback path:

Read‑only traffic can be switched back to 5.7 replicas instantly if 8.0 performance degrades.

Primary rollback is possible because replication from 8.0 to 5.7 is forced to use compatible settings (utf8 charset, utf8_unicode_ci collation) and temporary role‑based privilege adjustments are applied during the upgrade window.

Key Technical Challenges

Vitess Sharding

Vitess clusters required coordinated upgrades of both MySQL instances and the VTgate proxy. Some client libraries (e.g., Java) depended on the query cache, which was removed in 8.0; the VTgate configuration was updated to advertise the new version after each shard upgrade.

Replication Lag and Bugs

Early testing uncovered a replication error that was fixed in MySQL 8.0.28; the upgrade therefore targeted 8.0.28 or newer. Higher write throughput in 8.0 increased lag, so Freno was tuned to rate‑limit writes based on observed lag metrics.

Production Query Failures

Large WHERE IN clauses that passed CI caused crashes on 8.0 under real load. The offending queries were rewritten, and query sampling combined with Solarwinds DPM (VividCortex) was used to surface such patterns before they reached production.

Lessons Learned

Extensive observability (metrics, query sampling, replication health) is essential for a safe, incremental upgrade.

Automated testing against both MySQL versions catches deprecations early.

Maintaining a mixed‑version environment during the rollout provides a safety net but requires careful configuration management (character set, collation, role privileges).

Sharding isolates risk; upgrading one Vitess shard at a time limits blast radius.

Tooling such as Orchestrator, Percona Toolkit, and Freno proved critical for topology changes and lag mitigation.

Conclusion

The year‑long, phased upgrade demonstrates that large‑scale MySQL migrations can be performed with zero SLO impact when backed by robust automation, observability, and a well‑tested rollback strategy. The experience establishes a repeatable process for future MySQL version upgrades across GitHub’s growing fleet.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Automation high availability mysql Replication GitHub Database Upgrade

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Motivation

Scale of the MySQL Deployment

Preparation and Compatibility Checks

Upgrade Strategy

Step 1 – Rolling upgrade of read‑only replicas

Step 2 – Re‑configure replication topology

Step 3 – Graceful failover to MySQL 8.0 primary

Step 4 – Upgrade non‑production and backup instances

Step 5 – Cleanup

Rollback Capability

Key Technical Challenges

Vitess Sharding

Replication Lag and Bugs

Production Query Failures

Lessons Learned

Conclusion

dbaplus Community

How this landed with the community

Was this worth your time?

0 Comments

Step 1 – Rolling upgrade of read‑only replicas

Step 2 – Re‑configure replication topology

Step 3 – Graceful failover to MySQL 8.0 primary

Step 4 – Upgrade non‑production and backup instances

Step 5 – Cleanup