Unlocking MySQL: How SQL Parsing Works and Boosts DBA Efficiency
This article explains why protecting database systems is critical, reviews existing SQL‑analysis tools, and dives deep into MySQL's lexical and syntax parsing techniques—including Bison‑generated parsers, core data structures, and practical applications such as useless‑condition removal and SQL feature generation—to help DBAs automate and optimize their workflows.
Current Landscape and Scenarios
Database systems are core components that must be protected; accidental online operations can cause severe failures. Traditional safeguards—coding standards, DBA reviews, and approval workflows—are costly at scale, prompting a shift toward automated, technology‑driven solutions.
Existing open‑source tools like Meituan’s SQLAdvisor , Qunar’s Inception , and Alibaba’s Cloud DBA provide index recommendations or rule‑based audits, but many potential use‑cases remain untapped, such as table‑level slow‑query reports, SQL fingerprint generation, high‑risk operation confirmation, and SQL legality checks.
Desired features for a universal SQL parsing service include a simple input‑SQL‑to‑output (tables, fingerprints, optimization hints) interface and language‑agnostic access, e.g., via HTTP.
SQL Parsing Principles
SQL parsing belongs to the compiler domain and follows the classic stages: lexical analysis, syntax/semantic analysis, optimization, and code generation. In MySQL, these stages are illustrated in the diagram below.
Lexical Analysis
Lexical analysis tokenizes the input, distinguishing keywords (e.g., SELECT, FROM) from identifiers. MySQL implements its lexer by hand for performance; the relevant code resides in sql/lex.h and sql/sql_lex.cc. A fragment of the keyword table is shown below.
{ "&&", SYM(AND_AND_SYM) },
{ "<", SYM(LT) },
{ "<=", SYM(LE) },
{ "<>", SYM(NE) },
{ "!=", SYM(NE) },
{ "=", SYM(EQ) },
{ ">", SYM(GT_SYM) },
{ ">=", SYM(GE) },
{ "<<", SYM(SHIFT_LEFT) },
{ ">>", SYM(SHIFT_RIGHT) },
{ "<=>", SYM(EQUAL_SYM) },
{ "ACCESSIBLE", SYM(ACCESSIBLE_SYM) },
{ "ACTION", SYM(ACTION) },
{ "ADD", SYM(ADD) },
{ "AFTER", SYM(AFTER_SYM) },
{ "AGAINST", SYM(AGAINST) },
{ "AGGREGATE", SYM(AGGREGATE_SYM) },
{ "ALL", SYM(ALL) }Syntax Analysis
Syntax analysis builds an abstract syntax tree (AST). MySQL uses Bison for this purpose. The parser source is sql/sql_yacc.yy (≈17 K lines in MySQL 5.6). A simplified Bison rule for a SELECT statement is shown:
select_init:
SELECT_SYM select_init2
| '(' select_paren ')' union_opt ;
select_init2:
select_part2 { /* error handling */ } union_clause ;
select_part2:
select_options select_item_list { /* set parsing place */ } select_into select_lock_type ;
where_clause:
/* empty */ { Select->where = 0; }
| WHERE expr { /* store expression */ } ;The generated AST stores column names in item_list, table names in table_list, and conditions in where. The following diagram visualizes the tree for a sample query.
Core Data Structures
The central structure is SELECT_LEX (defined in sql/sql_lex.h), which links to item_list, table_list, and where. The diagram below outlines the relationships.
SQL Parsing Applications
Useless Condition Removal – Optimizer logic that eliminates tautological predicates (e.g., 1=1) is implemented in sql/sql_optimizer.cc (function remove_eq_conds). The article illustrates four cases with diagrams.
SQL Fingerprint Generation – Converting a query to a token‑based fingerprint (replacing literals with ?) enables reliable grouping. The process involves generating a token array (lexical stage) and then mapping tokens to a normalized form, as shown in the flow diagram.
Learning Recommendations
To master MySQL parsing and optimization, readers should:
Read foundational books on database optimizers (e.g., “The Art of Database Query Optimizers”).
Study a stable MySQL version source tree (e.g., 5.6.23) because parsing code evolves.
Use GDB to step through the lexer and parser, validating hypotheses.
Write small programs that invoke the parser or manipulate the AST to solidify understanding.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
