Meituan's Cost-Based Optimizer for Slow Query Index Recommendation
The article explains how Meituan uses MySQL's cost‑based optimizer to analyze slow queries, generate virtual index candidates, evaluate their costs with detailed statistics, and deploy a recommendation system that validates, tracks, and governs index suggestions to reduce CPU/IO waste and prevent database failures.
Background
Slow queries (execution >100 ms in Meituan) cause CPU/IO waste and account for over 10% of database failures, making rapid optimization essential.
Cost‑Based Optimizer Overview
SQL Execution and Optimizer
MySQL processes a query through parsing, logical/physical transformation, cost estimation, and plan selection. The cost model assigns numeric costs to operations such as CPU, I/O, temporary tables, and key comparison.
Cost Model Details
Server‑side costs (e.g., io_block_read_cost=1, row_evaluate_cost=0.2) and Engine‑side costs are summed; defaults are configurable in MySQL 5.7.
Cost‑Based Index Selection
For the example query
select * from sync_test1 where name like 'Bobby%' and dt > '2021-07-06', four candidate indexes are evaluated. EXPLAIN shows the optimizer chooses IX_name because its total cost (687.41) is far lower than a full‑table scan (21627.9) or other indexes.
Index Recommendation Engine
Design
A Fakeindex storage engine presents “virtual” indexes to the optimizer while the real data remains unchanged. A Go‑Server collects metadata, statistics, and sample rows, computes statistics for virtual indexes, and feeds them to the optimizer.
Workflow
Pre‑validation filters unsupported statements (e.g., system tables, non‑SELECT/UPDATE/DELETE).
Extract candidate columns from WHERE, JOIN, ORDER BY, GROUP BY, and aggregation functions.
Generate all permutations of candidate columns, then prune existing indexes, length‑exceeding indexes, and unsupported types (e.g., spatial indexes).
Collect metadata (SHOW CREATE TABLE), statistics (information_schema.tables, mysql.innodb_index_stats), and sample rows using block‑based sampling to avoid full scans.
Compute records_in_range and cardinality for each virtual index using sample scaling and a two‑stage slope method.
Create a temporary table with virtual indexes using Fakeindex, run EXPLAIN FORMAT=JSON, and record the chosen index.
Scalability
MySQL limits secondary indexes to 64. Full permutation quickly exceeds this; we limit to three‑column indexes (covers >95% of cases) and use a merge‑step algorithm to evaluate up to 4096 candidates.
Quality Assurance
Testing on 30 k slow‑query templates (≈246 GB) shows high coverage but occasional false recommendations due to sample bias, engine bugs, or optimizer limitations. To mitigate:
Effectiveness validation on a sample database.
Post‑deployment effect tracking via EXPLAIN and Flink‑based runtime metrics, with alerts on regressions.
Offline simulation environment that replays incidents on a cloned dataset.
Automated regression test suite built from production slow‑query logs and manually crafted edge cases.
Operational Governance
Slow queries are classified as historical, newly‑emerged, or potential. Recommendations are injected into weekly reports, real‑time dashboards, or pre‑deployment checks, and linked to DBA tickets.
Future Work
Plans include scaling the pipeline for billions of daily queries, incorporating index‑maintenance cost (disk space, write amplification), and moving from per‑SQL to workload‑wide global optimization similar to Alibaba Cloud DAS.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
