How YY Scaled Its Database Platform: From Manual Ops to Intelligent Automation
This article details YY's journey in transforming its database operations—from early quality and efficiency challenges to a multi‑stage platform that automates resource pooling, high‑availability proxy, cost control, quality monitoring, and security, outlining future intelligent extensions.
1. YY Database Team Challenges
Early issues included frequent quality incidents, slow resource‑request response, reactive security firefighting, and high cost due to low resource utilization.
Quality accidents harmed service reputation.
Resource demand response lagged behind business needs.
Security incidents were often handled post‑mortem.
Low resource utilization drove high costs.
2. Database Platform Goals
The platform aims for one‑stop service, minute‑level deployment, automated change operations, minute‑level monitoring and alerting, rapid fault diagnosis, auditability, measurable cost, and traceable security changes.
3. Resource Pool Management
YY operates multiple IDC and public‑cloud data centers with diverse storage (SAS, SATA, SSD, PCIe) and supports multiple MySQL, MongoDB, and Redis versions, allowing multi‑instance deployment on a single machine.
Business users select database type, package size, and configuration from a resource‑pool interface, enabling minute‑level interactions.
4. High‑Availability Proxy Architecture
YY uses a high‑availability proxy layer (OSPF/VIP) that routes writes to a primary IDC while reads can be served locally, providing load balancing, read/write separation, and special handling for large SQL queries.
Multi‑tenant design allows new services to be configured via the GK system without full cluster deployment.
5. MyShard – Multi‑Write Solution
MyShard implements eventual consistency by adding version numbers to tables, enabling multi‑write across regions and handling conflicts by preferring the highest version.
6. Database Synchronization Services
Provides tools to sync selected tables or fields between databases, support for heterogeneous targets, performance‑optimized sync, cache updates, RabbitMQ notifications, and cross‑database replication.
7. Database Quality Platform
Features rapid alarm detection, instance‑level and business‑level diagnostics, event tracking, and a global dashboard showing alerts, warnings, and health metrics via Kafka‑fed data streams.
8. Cost Management
Cost is quantified per instance, business, and organization, with dashboards showing usage and spending, and one‑click tier‑up/down of resource packages while ensuring data consistency during migrations.
9. Security Measures
Includes baseline network and OS security, cross‑region backup (primary and secondary), data‑consistency checks with automated repair, controlled isolation and decommission procedures, and SQL execution safeguards that rewrite risky statements (e.g., DROP to RENAME).
10. Future Outlook
YY plans to advance toward intelligent automation for scaling, monitoring, and capacity planning, enhance SLA transparency, strengthen SQL audit capabilities, and further improve resource utilization to lower operational costs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
