Databases 31 min read

Meituan Database Attack‑Defense Practice: Kernel Observability, Full SQL, and Index Optimization

Meituan’s database attack‑defense practice builds kernel observability using wait‑time quantification, captures every SQL statement directly from the MySQL kernel with lock‑free storage and Snappy compression, classifies remediation actions as lossless or lossy, and implements a multi‑layered index recommendation and governance framework that automates risk review, real‑time detection, and post‑execution optimization.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Meituan Database Attack‑Defense Practice: Kernel Observability, Full SQL, and Index Optimization

Introduction

This article is compiled from Meituan Technology Salon Session 75, titled “Meituan Database Attack‑Defense Practice”. It expands the original presentation with detailed technical information, covering database anomaly detection, diagnosis, kernel observability, full‑SQL collection, exception handling, and index optimization.

Kernel Observability Construction

The authors describe two major performance‑diagnosis challenges: (1) sudden SQL latency spikes without obvious root causes, and (2) reliance on point‑in‑time metrics such as information_schema.processlist that provide only partial views. They adopt a Wait‑time quantification method inspired by the TSA model, separating total latency into On‑CPU and Off‑CPU components. By instrumenting MySQL’s kernel (using lightweight getrusage calls and custom wait‑type metrics) they obtain fine‑grained per‑SQL timing, enabling precise bottleneck identification.

Full SQL Collection

Full‑SQL refers to capturing every SQL statement issued to the database together with its key metrics. The previous TCP‑packet‑parsing approach lacked sufficient detail, so the new design extracts SQL directly from the MySQL kernel. Collected data are stored in a lock‑free memory pool and written to files by a dedicated thread. Because the volume can reach petabytes per day, the pipeline compresses data with Snappy after sorting by SQL prefix to improve compression ratios (up to 7‑8×). The system supports both detailed SQL storage and aggregated template storage.

Exception Handling

After root‑cause analysis, the system classifies remediation actions as “lossless” (e.g., disk‑space cleanup, parameter tuning) or “lossy” (e.g., killing sessions, throttling). An architecture diagram shows the integration of an “exception discovery & analysis” service with a DBA‑driven “pre‑plan” service, enabling automated failover for MySQL hangs and coordinated recovery steps.

Index Optimization and Governance

The article outlines a multi‑layered index recommendation framework:

Single‑SQL recommendations: Analyze execution plans, compute cost components (table‑scan, index‑scan, range‑access, ref) using MySQL’s internal functions, and apply a What‑If strategy to evaluate hypothetical indexes.

Workload‑based recommendations: Identify “interesting” column groups via a CG‑Cost function, select candidate indexes, merge overlapping indexes, enumerate configurations (using greedy + limited brute‑force), and generate multi‑column indexes through iterative widening (MC_LEAD algorithm).

SQL governance: Provide three phases—pre‑deployment risk review, real‑time detection during execution, and post‑execution batch optimization. The system can automatically approve index additions (e.g., via Ghost) and monitor performance impact.

Author

Yufeng, from Meituan’s Basic R&D Platform – Basic Technology Department, responsible for the Meituan Database Autonomy Platform.

References

https://github.com/shenyufengdb/sql

https://github.com/percona/percona-server/tree/release-5.7.41-44

https://www.microsoft.com/en-us/research/publication/an-efficient-cost-driven-index-selection-tool-for-microsoft-sql-server/

https://www.microsoft.com/en-us/research/publication/plan-stitch-harnessing-the-best-of-many-plans-2/

https://www.microsoft.com/en-us/research/publication/random-sampling-for-histogram-construction-how-much-is-enough/

https://dl.acm.org/doi/10.1145/276304.276337

Performance TuningMySQLIndex Optimizationdatabase observabilityFull SQLWorkload Analysis
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.