Databases 6 min read

Generating SQL Fingerprints with TiDB SQL Parser

This article explains how to use the TiDB SQL parser to create custom SQL fingerprints by traversing the abstract syntax tree, replacing literal values with placeholders, and restoring the modified tree back to SQL, offering a more maintainable alternative to string‑based approaches.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Generating SQL Fingerprints with TiDB SQL Parser

Author: Sun Jian, R&D engineer at Aikesheng, responsible for high‑availability components and SQL audit development. Original content.

This article introduces a method to generate custom SQL fingerprints using TiDB's SQL parser, differing from the pt-fingerprint tool.

What is an SQL Fingerprint

An SQL fingerprint replaces literal values in a statement with a fixed symbol, useful for sanitization or classification. For example:

select * from t1 where id = 100;

becomes:

select * from t1 where id = ?;

Implementation of pt‑fingerprint

pt‑fingerprint works by applying regular‑expression matching to replace values, resulting in over 2,000 lines of complex string‑parsing code that does not consider SQL semantics.

Implementation Based on TiDB SQL Parser

The TiDB parser converts SQL text into an abstract syntax tree (AST) and can also convert the AST back to SQL, allowing modifications directly on the tree.

1. Parse SQL into an AST

The parsed AST roughly looks like this (ellipsis indicates omitted parts):

&ast.SelectStmt {
    Fields:
        ... &ast.WildCard
    From:
        ... &ast.TableName
            ... "t1"
    Where: &ast.BinaryOperationExpr
        L: &ast.ColumnNameExpr
            ... "id"
        R: &ast.ValueExpr
            ... 100
}

2. Modify the AST Nodes

TiDB uses a Visitor design pattern. By implementing a Visitor , we can traverse the tree and replace every ast.ValueExpr node with a placeholder "?".

// Visitor visits a Node.
type Visitor interface {
    Enter(n Node) (node Node, skipChildren bool)
    Leave(n Node) (node Node, ok bool)
}

// Define a FingerprintVisitor that implements Visitor
type FingerprintVisitor struct{}

func (f *FingerprintVisitor) Enter(n ast.Node) (node ast.Node, skipChildren bool) {
    if v, ok := n.(*driver.ValueExpr); ok {
        v.Type.Charset = ""
        v.SetValue([]byte("?"))
    }
    return n, false
}

func (f *FingerprintVisitor) Leave(n ast.Node) (node ast.Node, ok bool) {
    return n, true
}

3. Restore the Modified AST to SQL

Since TiDB v3, the AST node provides a Restore(ctx *RestoreCtx) error method to convert the tree back to SQL text.

Complete Example

package main

import (
    "bytes"
    "fmt"

    "github.com/pingcap/parser"
    "github.com/pingcap/parser/ast"
    "github.com/pingcap/parser/format"
    driver "github.com/pingcap/tidb/types/parser_driver"
)

// Define a FingerprintVisitor that implements Visitor
type FingerprintVisitor struct{}

func (f *FingerprintVisitor) Enter(n ast.Node) (node ast.Node, skipChildren bool) {
    if v, ok := n.(*driver.ValueExpr); ok {
        v.Type.Charset = ""
        v.SetValue([]byte("?"))
    }
    return n, false
}

func (f *FingerprintVisitor) Leave(n ast.Node) (node ast.Node, ok bool) {
    return n, true
}

func main() {
    sql := "select * from t1 where id = 100;"
    p := parser.New()
    stmt, err := p.ParseOneStmt(sql, "", "")
    if err != nil {
        return
    }
    stmt.Accept(&FingerprintVisitor{})

    buf := new(bytes.Buffer)
    restoreCtx := format.NewRestoreCtx(format.RestoreKeyWordUppercase|format.RestoreNameBackQuotes, buf)
    err = stmt.Restore(restoreCtx)
    if err != nil {
        return
    }
    fmt.Println(buf.String()) // SELECT * FROM `t1` WHERE `id`=?
}

Conclusion

Using the TiDB SQL parser enables fast and accurate SQL fingerprint generation while reducing the complexity of string‑based parsing.

It requires an understanding of TiDB's AST structure.

GoTiDBSQL parserSQL fingerprintVisitor pattern
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.