Databases 8 min read

Inside Cobar’s SQL Parser: AST Structure, Traversal, and Real‑World Uses

This article explains how Cobar’s SQL parser converts SQL statements into an abstract syntax tree (AST), details the parser’s implementation options, demonstrates AST node definitions and visitor‑based traversal in Java, and explores practical applications such as sharding, feature extraction, and dangerous‑SQL interception.

Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
Inside Cobar’s SQL Parser: AST Structure, Traversal, and Real‑World Uses

Background

Cobar is an open‑source database middleware from Alibaba. For an overview of Cobar, see the earlier article “Design and Implementation of Cobar SQL Auditing”.

Cobar

Cobar provides a SQL parser that turns SQL into an abstract syntax tree (AST) for further processing.

SQL

SQL is a domain‑specific language for relational databases. When a database executes SQL, it first performs lexical, syntactic, and semantic analysis to produce an AST, which the optimizer then transforms into an execution plan.

SQL Parser

A component that parses SQL into an AST is called a SQL Parser. There are two common ways to build one:

Generate automatically with tools – simple to implement but may have poor performance and be hard to extend.

Write manually – offers better performance and clearer, extensible code but requires strong knowledge of compiler theory.

Cobar implements its own SQL Parser, positioned in the architecture diagram before the SQL Router, indicating that the AST is used for sharding routing.

The parser has evolved through three versions, each improving performance:

Version 1: Generated by JavaCC, low performance, hard to optimize.

Version 2: Hand‑written following an ANTLR‑like structure, with many intermediate objects.

Version 3: Hand‑written LL(2) recognizer.

Cobar AST

The parser converts SQL into an AST. For example, the SQL statement:

select id,type from goods as g where type in (select type from type_config where status = 0)

produces the AST shown below:

The root node represents the SELECT statement, with child nodes for tables, columns, and expressions.

Key AST node definitions include an ASTNode interface with an accept method for visitor traversal:

public interface ASTNode {
    void accept(SQLASTVisitor visitor);
}

Important concrete node types are:

SQLStatement – e.g., DMLSelectStatement for SELECT, UPDATE, INSERT.

Expression – e.g., InExpression, ComparisonEqualsExpression, LiteralNumber, Identifier.

TableReference – e.g., TableReferences, TableRefFactor.

For instance, the implementation of ComparisonEqualsExpression includes left/right expressions, the operator "=", and the evaluation logic.

AST Operations

Traversing the AST relies on the SQLASTVisitor interface, which defines a visit method for each node type. Implementations such as MySQLOutputASTVisitor can reconstruct the original SQL:

SQLStatement stmt = SQLParserDelegate.parse(sql);
StringBuilder s = new StringBuilder();
stmt.accept(new MySQLOutputASTVisitor(s));
System.out.println(s.toString());

The above code prints:

SELECT id, type FROM goods AS G WHERE type IN (SELECT type FROM type_config WHERE status = 0)

The accept method in each AST node simply forwards the node to the visitor:

public void accept(SQLASTVisitor visitor) {
    visitor.visit(this);
}

Developers can create custom visitors to perform specific analyses.

AST Applications

Sharding

Cobar uses the AST to extract table names, column names, and predicate values for database‑level sharding.

SQL Feature Generation

ASTs enable normalization of SQL statements, e.g., converting

select id, name, age from user as u where age >= 20

and

select id, name, age from user as u where age >= 30

into

select id, name, age from user as u where age >= ?

which is useful for slow‑query analysis, statistics, and rate‑limiting.

Dangerous SQL Interception

By evaluating expressions in the AST, Cobar can detect UPDATE or DELETE statements lacking a WHERE clause or with a universally true condition and block them.

Conclusion

This article covered the origin, structure, traversal, and practical uses of SQL ASTs in Cobar. Readers are encouraged to explore Cobar’s unit tests for hands‑on demonstrations.

Search and follow the WeChat public account "捉虫大师" for backend technology sharing, architecture design, performance optimization, source‑code reading, troubleshooting, and practical experiences.
JavaASTDatabase MiddlewareSQL ParserCobar
Xiao Lou's Tech Notes
Written by

Xiao Lou's Tech Notes

Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.