Understanding MySQL SQL Parsing and Optimization Techniques
The article explains how to extend MySQL’s built‑in lexer and Bison‑based parser to expose table names, query features, and optimization advice via a simple language‑agnostic service, illustrating core data structures, useless‑condition elimination, feature generation for slow‑query analysis, and practical learning tips.
Database systems are core components that must be protected. Mistakes in online operations can cause severe failures, so many companies enforce development standards, DBA reviews, and approval workflows. To reduce manual effort, technical solutions based on MySQL source code have been created for SQL auditing, optimization suggestions, and more.
This article explores how to extend MySQL’s parser to provide richer features such as comprehensive optimization, multi‑dimensional slow‑query analysis, and fault‑analysis assistance. The key technology is SQL parsing.
Current Situation and Scenarios
SQL parsing is a complex technology usually owned by database vendors, though some companies offer parsing APIs. Middleware such as Druid (Java), MaxScale (C), and Kingshard (Go) perform partial parsing to support read/write splitting and sharding. Few products use parsing for maintenance; notable examples are:
Meituan‑Dianping’s SQLAdvisor – provides index‑optimization suggestions based on lexical analysis. Qunar’s Inception – audits SQL according to built‑in rules. Alibaba’s Cloud DBA – offers optimization advice and rewriting.
Beyond these, many potential use‑cases remain, such as table‑level slow‑query reports, SQL feature generation, high‑risk operation confirmation, and SQL legality checking.
Desired Features
Expose a simple SQL parsing interface that returns table names, features, and optimization advice. Provide language‑agnostic access (e.g., HTTP service).
Parsing Principles
SQL parsing belongs to the compiler domain and consists of lexical analysis, syntax/semantic analysis, optimization, and code generation. The following diagram (omitted) shows MySQL’s pipeline.
Lexical Analysis
Lexical analysis converts input into tokens (keywords and identifiers). MySQL implements its own lexer (see sql/lex.h and sql/sql_lex.cc) rather than using Flex. A fragment of the keyword table is shown below:
{ "&&", SYM(AND_AND_SYM)},
{ "<", SYM(LT)},
{ "<=", SYM(LE)},
{ "<>", SYM(NE)},
{ "!=", SYM(NE)},
{ "=", SYM(EQ)},
{ ">", SYM(GT_SYM)},
{ ">=", SYM(GE)},
{ "<<", SYM(SHIFT_LEFT)},
{ ">>", SYM(SHIFT_RIGHT)},
{ "<=>", SYM(EQUAL_SYM)},
{ "ACCESSIBLE", SYM(ACCESSIBLE_SYM)},
{ "ACTION", SYM(ACTION)},
{ "ADD", SYM(ADD)},
{ "AFTER", SYM(AFTER_SYM)},
{ "AGAINST", SYM(AGAINST)},
{ "AGGREGATE", SYM(AGGREGATE_SYM)},
{ "ALL", SYM(ALL)}The core lexer function is MySQLLex::lex_one_Token in sql/sql_lex.c.
Syntax Analysis
Syntax analysis builds an abstract syntax tree (AST). MySQL uses Bison for this purpose. The grammar resides mainly in sql/sql_yacc.yy (≈17 K lines in MySQL 5.6). A simplified excerpt:
select_init:
SELECT_SYM select_init2
| '(' select_paren ')' union_opt ;
select_init2:
select_part2 { /* set parsing context */ } union_clause ;
select_part2:
{ /* allocate SELECT_LEX */ } select_options select_item_list /* parse columns */
{ Select->parsing_place = NO_MATTER; } select_into select_lock_type ;
where_clause:
/* empty */ { Select->where = 0; }
| WHERE { Select->parsing_place = IN_WHERE; } expr { /* attach expression */ } ;
expr:
expr AND expr %prec AND_SYM { /* build Item_cond_and */ }
| expr OR expr %prec OR_SYM { /* build Item_cond_or */ }
| ... ;The parser embeds C++ code to construct objects such as TABLE_LIST, Item, and SELECT_LEX. These structures store table names, column lists, and condition trees.
Core Data Structures
The central structure is SELECT_LEX (defined in sql/sql_lex.h). It links to item_list (columns), table_list (tables), and where (condition tree). The following diagram (omitted) illustrates the relationships.
Applications of SQL Parsing
Useless Condition Removal
Optimizer logic can eliminate tautological predicates (e.g., 1=1) or contradictory ones ( 1=2). The implementation lives in sql/sql_optimizer.cc (function remove_eq_conds). Four typical cases are illustrated with diagrams in the original article.
SQL Feature Generation
For slow‑query analysis and query classification, SQL statements are transformed into a feature representation where literals are replaced by placeholders ( ?). Example:
select username, ismale from userinfo where age > ? and level > ?The process consists of two steps: (a) generate a token array during lexical analysis, and (b) map the token array to a feature string. This approach is more reliable than regex‑based tools like pt‑query‑digest, which suffer from bugs.
Learning Recommendations
To deepen understanding of MySQL parsers and optimizers, consider the following:
Read books on database query optimizers (e.g., “The Art of Database Query Optimizer”).
Study a specific MySQL version’s source code (e.g., 5.6.23) because the parser evolves over time.
Use GDB to step through the parser and verify hypotheses.
Write small programs that invoke the parser to solidify knowledge.
Author Bios
Guangyou – senior MySQL DBA at Meituan‑Dianping, USTC graduate, focuses on MySQL and related tools.
Jinlong – joins Meituan‑Dianping in 2014, works on database operations, high‑availability, and platform construction.
Xingfan – DBA at Meituan‑Dianping, experienced in MySQL operations and automation scripts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
