How We Scaled a High‑Traffic Messaging Service by Migrating MySQL to PolarDB
This article details the migration of a popular social app's private‑message service from a saturated MySQL cluster to PolarDB, covering business challenges, evaluation of storage‑optimization, vertical and horizontal scaling, the chosen distributed database solution, step‑by‑step offline and online migration procedures, and the resulting performance and cost benefits.
Introduction
In‑app private messaging is a critical bridge for user interaction on the popular social app Yingke. Rapid user growth exhausted the original MySQL database, prompting a migration to PolarDB to eliminate storage bottlenecks and improve scalability.
Business Background
The current private‑message service uses a heavily read‑write MySQL setup with N databases and N tables, a Redis cache layer, and low CPU utilization while storage is near its limit.
Current Situation
High read/write volume
Database sharded across many tables
Standard SQL without special features
Redis cache sits in front of reads
Storage utilization at 85% and growing daily
Key challenges include continuous data growth and the need for a fast, business‑transparent migration that supports the MySQL protocol, massive storage, and dynamic scaling.
Migration Options Explored
Storage optimization
Vertical scaling
Horizontal scaling
Distributed database
Storage Optimization
1. Archive several years of data – frees ~20% space but introduces Redis‑DB consistency issues and requires lazy‑load logic. 2. Compress the content field – frees ~30% space but needs code changes and historical data migration.
Pros: No additional hardware cost. Cons: Requires program changes and scripts; long‑term storage limits remain.
Vertical Scaling
Increasing hardware capacity (e.g., larger disks) faces two problems: the current RDS instance already hits maximum disk size, and storage expansion often forces simultaneous compute scaling, raising costs without addressing the real bottleneck.
Pros: Transparent to the business, no code changes. Cons: Increases monthly cost and wastes compute resources.
Horizontal Scaling
Because the service already shards databases, we can split half of the tables to a new cluster, reducing per‑node storage. This requires data migration via DTS and program changes to clean data.
Pros: Can halve or further reduce storage usage. Cons: Requires data‑cleaning scripts, increasing development effort.
Distributed Database
After evaluating several products, PolarDB for MySQL was selected. Its compute‑storage separation, high availability, and horizontal scalability match our needs. Unlike traditional RDS MySQL, PolarDB stores a single data copy shared by all compute nodes, eliminating extra storage cost for replicas.
PolarDB also avoids additional storage cost when adding read replicas because all compute nodes share the same data.
Migration Implementation Strategy
Overview
Both offline (stop‑service) and online migration strategies were considered; the online approach was chosen.
We create a new PolarDB instance, enable DTS data sync, let DTS catch up, then switch traffic during a low‑traffic window to minimize impact.
Offline Migration Steps
Create PolarDB for MySQL instance
Enable DTS data synchronization
When sync catches up, stop service pods and wait for full consistency
Update application to point to PolarDB
Redeploy
Migration complete
Online Migration Steps
Preparation:
DBA creates PolarDB instance
DBA enables DTS sync
Developers add dual‑write connection info (MySQL + PolarDB)
Implement write‑pause using Redis switch
Use Go channels to buffer Add operations and Sleep to block Update/Delete during pause
Define migration switch states (1‑read/write MySQL, 2‑pause writes, 3‑dual‑write, 4‑read/write PolarDB)
During Migration:
Set switch to state 2 during low‑traffic period to stop writes
DBA monitors DTS until MySQL data fully syncs to PolarDB (≈1‑2 min)
Set switch to state 3 to start dual‑write
Both DBA and developers verify row counts, error logs, and private‑message functionality
Post‑Migration:
If errors appear, revert to state 1 (MySQL)
After 1‑2 days of stable operation, switch to state 4 (full PolarDB), remove MySQL connection, delete switch logic, and redeploy
DBA monitors MySQL traffic; if none, decommission the instance
Migration complete
Key Mechanism
A Redis flag allows the program to toggle between MySQL and PolarDB and to pause writes. During the pause, Add operations are buffered in a Go channel, and Update/Delete are blocked using Sleep, ensuring data consistency while the switch occurs.
Capacity calculations for the buffer (e.g., 500 Add QPM across 10 pods → 50‑element channel) show negligible memory impact.
Post‑Migration Metrics
Monitoring shows expected PolarDB metrics. Cost reduced by ~18% compared to the previous MySQL setup. P99 latency is around 40 ms, meeting business expectations. Compute‑storage separation now supports up to 100 TB.
Precautions
Ensure write‑pause does not cause panic‑induced data loss; test thoroughly and log persistently.
Online migration requires complete knowledge of all database operations to avoid incomplete data transfer.
Results
Minimal business impact compared with offline migration.
Cost reduction of 18% after migration.
P99 response time ~40 ms.
Storage and compute are decoupled, supporting up to 100 TB.
Inke Technology
Official account of Inke Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.