Databases 13 min read

Evolution of Meituan’s Database Platform: From Manual Operations to Intelligent Automation

This article outlines Meituan’s transition of its database platform from manual, script‑based operations through tool‑ and product‑centric stages to a private‑cloud and automation era, discusses current challenges such as root‑cause analysis and staffing, and shares insights on moving toward fully intelligent, data‑driven database operations.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Evolution of Meituan’s Database Platform: From Manual Operations to Intelligent Automation

In recent years, traditional database operation methods have struggled to meet business demands for stability, availability, and flexibility. Driven by rapid growth, Meituan’s DBA team transformed from manual "human" operations to tool‑based, product‑oriented, self‑service, and automated processes, now exploring intelligent operations.

The platform evolved through five major stages: (1) a scripting phase supporting small clusters; (2) a tooling phase that packaged scripts into utilities, integrated CMDB, monitoring, DDL change, SQL review, slow‑query analysis, and backup tools; (3) a productization phase that combined tools into repeatable processes to improve efficiency, safety, and consistency for DBAs; (4) a private‑cloud/self‑service phase that opened routine tasks to developers, handling hundreds of schema changes, thousands of queries, and numerous account, monitoring, and log operations daily; (5) an automation phase, currently semi‑automatic, aiming for full automation of MySQL HA, self‑protection, capacity diagnosis, and auto‑scaling.

Today the platform integrates many functions—high‑availability, MGW management, DNS changes, backup, upgrade workflows, traffic switching, account management, data archiving, and asset flow—across relational, KV‑cache, KV‑store, and emerging NewSQL services, with a vision of a one‑stop MySQL + NoSQL + NewSQL storage platform. Key challenges include difficulty in root‑cause localization for database‑internal failures, fragmented DBA time, a shortage of skilled DBAs, and the need to shift from reactive incident response to proactive, intelligent operations.

Transitioning from automation to intelligent operations involves making the platform transparent to developers while providing stable, fast storage. Traditional ops rely on limited data collection and reactive alarms; intelligent ops emphasize heavy data collection, analysis, pre‑warning, and automated execution, reducing the proportion of "alarm + analysis + execution". Data collected spans MySQL global status, variables, processlist, InnoDB status, logs, binlog, application success rates, latency percentiles, error logs, throughput, OS metrics, and change logs. Analysis proceeds from cluster‑level to instance, database, and table, enabling capacity planning, health checks, and targeted alert remediation.

Future work includes building a fault‑diagnosis platform ("Bian Que") for log ingestion, storage, and analysis, exposing APIs for end‑to‑end fault location and service governance. The roadmap envisions deeper integration of AI, Big Data, and Cloud Computing, blurring the lines between NoSQL and SQL, and achieving a platform that can autonomously discover, locate, and resolve issues.

big dataCloud ComputingPlatform EngineeringautomationscalabilityDatabasesintelligent operations
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.