Databases 15 min read

Unlocking MySQL: How SQL Parsing Works and Boosts DBA Efficiency

This article explains why protecting database systems is critical, reviews existing SQL‑analysis tools, and dives deep into MySQL's lexical and syntax parsing techniques—including Bison‑generated parsers, core data structures, and practical applications such as useless‑condition removal and SQL feature generation—to help DBAs automate and optimize their workflows.

ITPUB
ITPUB
ITPUB
Unlocking MySQL: How SQL Parsing Works and Boosts DBA Efficiency

Current Landscape and Scenarios

Database systems are core components that must be protected; accidental online operations can cause severe failures. Traditional safeguards—coding standards, DBA reviews, and approval workflows—are costly at scale, prompting a shift toward automated, technology‑driven solutions.

Existing open‑source tools like Meituan’s SQLAdvisor , Qunar’s Inception , and Alibaba’s Cloud DBA provide index recommendations or rule‑based audits, but many potential use‑cases remain untapped, such as table‑level slow‑query reports, SQL fingerprint generation, high‑risk operation confirmation, and SQL legality checks.

Desired features for a universal SQL parsing service include a simple input‑SQL‑to‑output (tables, fingerprints, optimization hints) interface and language‑agnostic access, e.g., via HTTP.

SQL Parsing Principles

SQL parsing belongs to the compiler domain and follows the classic stages: lexical analysis, syntax/semantic analysis, optimization, and code generation. In MySQL, these stages are illustrated in the diagram below.

SQL parsing principle diagram
SQL parsing principle diagram

Lexical Analysis

Lexical analysis tokenizes the input, distinguishing keywords (e.g., SELECT, FROM) from identifiers. MySQL implements its lexer by hand for performance; the relevant code resides in sql/lex.h and sql/sql_lex.cc. A fragment of the keyword table is shown below.

{ "&&", SYM(AND_AND_SYM) },
{ "<", SYM(LT) },
{ "<=", SYM(LE) },
{ "<>", SYM(NE) },
{ "!=", SYM(NE) },
{ "=", SYM(EQ) },
{ ">", SYM(GT_SYM) },
{ ">=", SYM(GE) },
{ "<<", SYM(SHIFT_LEFT) },
{ ">>", SYM(SHIFT_RIGHT) },
{ "<=>", SYM(EQUAL_SYM) },
{ "ACCESSIBLE", SYM(ACCESSIBLE_SYM) },
{ "ACTION", SYM(ACTION) },
{ "ADD", SYM(ADD) },
{ "AFTER", SYM(AFTER_SYM) },
{ "AGAINST", SYM(AGAINST) },
{ "AGGREGATE", SYM(AGGREGATE_SYM) },
{ "ALL", SYM(ALL) }

Syntax Analysis

Syntax analysis builds an abstract syntax tree (AST). MySQL uses Bison for this purpose. The parser source is sql/sql_yacc.yy (≈17 K lines in MySQL 5.6). A simplified Bison rule for a SELECT statement is shown:

select_init:
    SELECT_SYM select_init2
  | '(' select_paren ')' union_opt ;

select_init2:
    select_part2 { /* error handling */ } union_clause ;

select_part2:
    select_options select_item_list { /* set parsing place */ } select_into select_lock_type ;

where_clause:
    /* empty */ { Select->where = 0; }
  | WHERE expr { /* store expression */ } ;

The generated AST stores column names in item_list, table names in table_list, and conditions in where. The following diagram visualizes the tree for a sample query.

AST for sample SELECT
AST for sample SELECT

Core Data Structures

The central structure is SELECT_LEX (defined in sql/sql_lex.h), which links to item_list, table_list, and where. The diagram below outlines the relationships.

SELECT_LEX structure
SELECT_LEX structure

SQL Parsing Applications

Useless Condition Removal – Optimizer logic that eliminates tautological predicates (e.g., 1=1) is implemented in sql/sql_optimizer.cc (function remove_eq_conds). The article illustrates four cases with diagrams.

Useless condition case a
Useless condition case a
Useless condition case b
Useless condition case b

SQL Fingerprint Generation – Converting a query to a token‑based fingerprint (replacing literals with ?) enables reliable grouping. The process involves generating a token array (lexical stage) and then mapping tokens to a normalized form, as shown in the flow diagram.

Fingerprint generation flow
Fingerprint generation flow

Learning Recommendations

To master MySQL parsing and optimization, readers should:

Read foundational books on database optimizers (e.g., “The Art of Database Query Optimizers”).

Study a stable MySQL version source tree (e.g., 5.6.23) because parsing code evolves.

Use GDB to step through the lexer and parser, validating hypotheses.

Write small programs that invoke the parser or manipulate the AST to solidify understanding.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

query optimizationmysqlSQL parsingsyntax treeDBA toolsBisonlexical analysis
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.