How Meituan Optimizes MySQL Slow Queries with Cost‑Based Index Recommendations
This article explains how Meituan tackles MySQL slow‑query problems by leveraging the database cost‑based optimizer to generate index suggestions, evaluate their quality, and operate a full‑stack governance workflow that includes data collection, statistical modeling, validation, and continuous improvement.
Background
Slow queries in MySQL consume excessive CPU and I/O, often causing business outages; Meituan processes over a hundred million slow queries daily, with more than 10% of database failures linked to them.
Cost‑Based Optimizer Overview
SQL Execution Flow
The MySQL execution pipeline includes parsing, logical transformation, cost preparation, cost‑based optimization, additional rewrites, and plan execution.
Cost Model
Costs are split into server‑side CPU cost and engine‑side I/O cost. Default MySQL 5.7 parameters include io_block_read_cost=1, row_evaluate_cost=0.2, and others such as memory_temptable_create_cost, key_compare_cost, etc.
Cost‑Based Index Selection
Using the same cost model as the optimizer, Meituan evaluates all possible index combinations for a given SQL. Example SQL:
SELECT * FROM sync_test1 WHERE name LIKE 'Bobby%' AND dt > '2021-07-06';Four candidate indexes (IX_name, IX_dt, IX_dt_name, IX_name_dt) are created, and EXPLAIN shows the optimizer choosing IX_name with the lowest cost (687.41 vs higher costs for other indexes).
+----+-------------+------------+-------+-------------------------------------+--------+---------+------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+-------------------------------------+--------+---------+------+--------+------------------------------------+
| 1 | SIMPLE | sync_test1 | range | IX_name,IX_dt,IX_dt_name,IX_name_dt | IX_name| 12 | NULL | 572 | Using index condition; Using where |
+----+-------------+------------+-------+-------------------------------------+--------+---------+------+--------+------------------------------------+Tracing the optimizer confirms that other indexes are rejected because their range‑scan cost exceeds that of IX_name.
Index Recommendation Architecture
Meituan splits the system into a lightweight Fakeindex storage engine (C++) and a Go‑based recommendation service. The engine forwards index‑related calls to the Go server via cgo‑based RPC, avoiding heavyweight frameworks.
The Go server performs the core logic:
Pre‑validation to filter unsupported statements.
Extract key columns from WHERE, JOIN, ORDER BY, GROUP BY, and aggregation functions.
Generate candidate index permutations, then prune existing indexes, overly long indexes, and unsupported types.
Collect metadata (table definition, existing indexes), statistics (row count, data size, cardinality), and sample data using block‑based sampling to avoid heavy I/O.
Compute statistical metrics such as records_in_range and cardinality using sample scaling and slope‑based adjustments.
Evaluate candidate index costs by creating a fake table with those indexes in the Fakeindex engine, running EXPLAIN FORMAT=JSON, and selecting the lowest‑cost plan.
Because MySQL limits a table to 64 secondary indexes, Meituan adopts a merge‑batch strategy to handle thousands of candidates, supporting up to 4096 candidates and 17 columns.
Recommendation Quality Assurance
Testing on 246 GB of recent slow‑query logs (≈30 k SQL templates) shows high coverage, though occasional invalid recommendations arise due to sample bias, system bugs, or optimizer limitations.
To mitigate this, Meituan adds two post‑recommendation steps:
Effectiveness Validation: Run the recommended index on a sampled replica and verify cost reduction via EXPLAIN.
Effect Tracking: After deployment, monitor real‑time query execution with Flink; if performance regresses, alert DBAs immediately.
Simulation Environment & Test Case Library
A offline simulation environment reproduces production data and the full recommendation pipeline, allowing safe debugging of failures. Test cases are continuously harvested from production slow‑query logs, incident tickets, and manually crafted edge cases, forming a growing repository.
Slow‑Query Governance Operations
Meituan classifies slow queries into three time‑based groups:
Historical: Long‑standing slow queries, managed via periodic reports.
New: Recently appearing queries, prioritized by impact and fed into real‑time alerts.
Potential: Queries not yet slow but likely to become so as data grows; proactive recommendations are generated.
Each group follows a tailored workflow, from automatic index suggestion to manual DBA approval.
Project Status & Future Plans
To date, the system has analyzed >6 k slow queries, issued >1 k index recommendations, and is gradually rolled out to production. Future work includes scaling to handle billions of daily slow queries, incorporating index‑maintenance cost (disk usage, write overhead), and moving from per‑SQL to workload‑wide global optimization similar to Alibaba Cloud DAS.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
