Generating SQL Fingerprints with TiDB SQL Parser
This article explains how to use the TiDB SQL parser to create custom SQL fingerprints by traversing the abstract syntax tree, replacing literal values with placeholders, and restoring the modified tree back to SQL, offering a more maintainable alternative to string‑based approaches.
Author: Sun Jian, R&D engineer at Aikesheng, responsible for high‑availability components and SQL audit development. Original content.
This article introduces a method to generate custom SQL fingerprints using TiDB's SQL parser, differing from the pt-fingerprint tool.
What is an SQL Fingerprint
An SQL fingerprint replaces literal values in a statement with a fixed symbol, useful for sanitization or classification. For example:
select * from t1 where id = 100;becomes:
select * from t1 where id = ?;Implementation of pt‑fingerprint
pt‑fingerprint works by applying regular‑expression matching to replace values, resulting in over 2,000 lines of complex string‑parsing code that does not consider SQL semantics.
Implementation Based on TiDB SQL Parser
The TiDB parser converts SQL text into an abstract syntax tree (AST) and can also convert the AST back to SQL, allowing modifications directly on the tree.
1. Parse SQL into an AST
The parsed AST roughly looks like this (ellipsis indicates omitted parts):
&ast.SelectStmt {
Fields:
... &ast.WildCard
From:
... &ast.TableName
... "t1"
Where: &ast.BinaryOperationExpr
L: &ast.ColumnNameExpr
... "id"
R: &ast.ValueExpr
... 100
}2. Modify the AST Nodes
TiDB uses a Visitor design pattern. By implementing a Visitor , we can traverse the tree and replace every ast.ValueExpr node with a placeholder "?".
// Visitor visits a Node.
type Visitor interface {
Enter(n Node) (node Node, skipChildren bool)
Leave(n Node) (node Node, ok bool)
}
// Define a FingerprintVisitor that implements Visitor
type FingerprintVisitor struct{}
func (f *FingerprintVisitor) Enter(n ast.Node) (node ast.Node, skipChildren bool) {
if v, ok := n.(*driver.ValueExpr); ok {
v.Type.Charset = ""
v.SetValue([]byte("?"))
}
return n, false
}
func (f *FingerprintVisitor) Leave(n ast.Node) (node ast.Node, ok bool) {
return n, true
}3. Restore the Modified AST to SQL
Since TiDB v3, the AST node provides a Restore(ctx *RestoreCtx) error method to convert the tree back to SQL text.
Complete Example
package main
import (
"bytes"
"fmt"
"github.com/pingcap/parser"
"github.com/pingcap/parser/ast"
"github.com/pingcap/parser/format"
driver "github.com/pingcap/tidb/types/parser_driver"
)
// Define a FingerprintVisitor that implements Visitor
type FingerprintVisitor struct{}
func (f *FingerprintVisitor) Enter(n ast.Node) (node ast.Node, skipChildren bool) {
if v, ok := n.(*driver.ValueExpr); ok {
v.Type.Charset = ""
v.SetValue([]byte("?"))
}
return n, false
}
func (f *FingerprintVisitor) Leave(n ast.Node) (node ast.Node, ok bool) {
return n, true
}
func main() {
sql := "select * from t1 where id = 100;"
p := parser.New()
stmt, err := p.ParseOneStmt(sql, "", "")
if err != nil {
return
}
stmt.Accept(&FingerprintVisitor{})
buf := new(bytes.Buffer)
restoreCtx := format.NewRestoreCtx(format.RestoreKeyWordUppercase|format.RestoreNameBackQuotes, buf)
err = stmt.Restore(restoreCtx)
if err != nil {
return
}
fmt.Println(buf.String()) // SELECT * FROM `t1` WHERE `id`=?
}Conclusion
Using the TiDB SQL parser enables fast and accurate SQL fingerprint generation while reducing the complexity of string‑based parsing.
It requires an understanding of TiDB's AST structure.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.