Inside Cobar’s SQL Parser: AST Structure, Traversal, and Real‑World Uses
This article explains how Cobar’s SQL parser converts SQL statements into an abstract syntax tree (AST), details the parser’s implementation options, demonstrates AST node definitions and visitor‑based traversal in Java, and explores practical applications such as sharding, feature extraction, and dangerous‑SQL interception.
Background
Cobar is an open‑source database middleware from Alibaba. For an overview of Cobar, see the earlier article “Design and Implementation of Cobar SQL Auditing”.
Cobar
Cobar provides a SQL parser that turns SQL into an abstract syntax tree (AST) for further processing.
SQL
SQL is a domain‑specific language for relational databases. When a database executes SQL, it first performs lexical, syntactic, and semantic analysis to produce an AST, which the optimizer then transforms into an execution plan.
SQL Parser
A component that parses SQL into an AST is called a SQL Parser. There are two common ways to build one:
Generate automatically with tools – simple to implement but may have poor performance and be hard to extend.
Write manually – offers better performance and clearer, extensible code but requires strong knowledge of compiler theory.
Cobar implements its own SQL Parser, positioned in the architecture diagram before the SQL Router, indicating that the AST is used for sharding routing.
The parser has evolved through three versions, each improving performance:
Version 1: Generated by JavaCC, low performance, hard to optimize.
Version 2: Hand‑written following an ANTLR‑like structure, with many intermediate objects.
Version 3: Hand‑written LL(2) recognizer.
Cobar AST
The parser converts SQL into an AST. For example, the SQL statement:
select id,type from goods as g where type in (select type from type_config where status = 0)
produces the AST shown below:
The root node represents the SELECT statement, with child nodes for tables, columns, and expressions.
Key AST node definitions include an ASTNode interface with an accept method for visitor traversal:
public interface ASTNode {
void accept(SQLASTVisitor visitor);
}Important concrete node types are:
SQLStatement – e.g., DMLSelectStatement for SELECT, UPDATE, INSERT.
Expression – e.g., InExpression, ComparisonEqualsExpression, LiteralNumber, Identifier.
TableReference – e.g., TableReferences, TableRefFactor.
For instance, the implementation of ComparisonEqualsExpression includes left/right expressions, the operator "=", and the evaluation logic.
AST Operations
Traversing the AST relies on the SQLASTVisitor interface, which defines a visit method for each node type. Implementations such as MySQLOutputASTVisitor can reconstruct the original SQL:
SQLStatement stmt = SQLParserDelegate.parse(sql);
StringBuilder s = new StringBuilder();
stmt.accept(new MySQLOutputASTVisitor(s));
System.out.println(s.toString());The above code prints:
SELECT id, type FROM goods AS G WHERE type IN (SELECT type FROM type_config WHERE status = 0)
The accept method in each AST node simply forwards the node to the visitor:
public void accept(SQLASTVisitor visitor) {
visitor.visit(this);
}Developers can create custom visitors to perform specific analyses.
AST Applications
Sharding
Cobar uses the AST to extract table names, column names, and predicate values for database‑level sharding.
SQL Feature Generation
ASTs enable normalization of SQL statements, e.g., converting
select id, name, age from user as u where age >= 20
and
select id, name, age from user as u where age >= 30
into
select id, name, age from user as u where age >= ?
which is useful for slow‑query analysis, statistics, and rate‑limiting.
Dangerous SQL Interception
By evaluating expressions in the AST, Cobar can detect UPDATE or DELETE statements lacking a WHERE clause or with a universally true condition and block them.
Conclusion
This article covered the origin, structure, traversal, and practical uses of SQL ASTs in Cobar. Readers are encouraged to explore Cobar’s unit tests for hands‑on demonstrations.
Search and follow the WeChat public account "捉虫大师" for backend technology sharing, architecture design, performance optimization, source‑code reading, troubleshooting, and practical experiences.
Xiao Lou's Tech Notes
Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
