Backend Development 19 min read

How to Build a Go Trie for Real‑Time Sensitive Word Filtering

This article demonstrates how to implement a sensitive‑word detection system in Go using a prefix‑tree (Trie), covering brute‑force, regex, and optimized rune‑based methods, plus special‑character filtering, pinyin support, and complete source code examples.

MaGe Linux Operations

Oct 29, 2022

How to Build a Go Trie for Real‑Time Sensitive Word Filtering

Introduction

Everyone knows that game chat and article publishing often require sensitive‑word detection, where offending words are masked or the content is blocked. This tutorial shows how to implement such detection in Go using a prefix‑tree (Trie).

Sensitive Word Detection

Various methods exist, such as brute‑force, regular expressions, and prefix‑trees. First, we define a list of sensitive words:

sensitiveWords := []string{"傻逼", "傻叉", "垃圾", "妈的", "sb"}

When a user inputs a sentence containing these words, we need to locate and mask them.

Brute‑Force Replacement

sensitiveWords := []string{"傻逼", "傻叉", "垃圾", "妈的", "sb"}
text := "什么垃圾打野，傻逼一样，叫你来开龙不来，sb"
for _, word := range sensitiveWords {
    text = strings.Replace(text, word, "*", -1)
}
println("text -> ", text)

The result is 什么*打野，*一样，叫你来开龙不来，*, but this approach has O(N²) time complexity and masks each character with a single asterisk, which is insufficient for multibyte Chinese characters.

Improved Brute‑Force with Rune

sensitiveWords := []string{"傻逼", "傻叉", "垃圾", "妈的", "sb"}
text := "什么垃圾打野，傻逼一样，叫你来开龙不来，sb"
for _, word := range sensitiveWords {
    replaceChar := ""
    for i := 0; i < len([]rune(word)); i++ {
        replaceChar += "*"
    }
    text = strings.Replace(text, word, replaceChar, -1)
}
println("text -> ", text)

Because Go uses UTF‑8, each Chinese character occupies three bytes, so we construct a replacement string with the same number of asterisks as the rune length, yielding

什么******打野，******一样，叫你来开龙不来，**

Go Rune Type

rune is an alias for int32 and represents a Unicode code point.

fmt.Println("a -> ", rune('a'))
fmt.Println("A -> ", rune('A'))
fmt.Println("晖 -> ", rune('晖'))
fmt.Println("霞 -> ", rune('霞'))
fmt.Println("晖霞 -> ", []rune("晖霞"))

We convert sensitive words to []rune to count characters accurately.

Implementing the Trie in Go

Trie Structure

Trie (prefix tree) is an N‑ary tree where each node represents a character prefix.

type TrieNode struct {
    childMap map[rune]*TrieNode // children
    Data     string            // full word at leaf
    End      bool              // marks end of a word
}

type SensitiveTrie struct {
    replaceChar rune
    root        *TrieNode
}

Adding Sensitive Words

func (tn *TrieNode) AddChild(c rune) *TrieNode {
    if tn.childMap == nil {
        tn.childMap = make(map[rune]*TrieNode)
    }
    if node, ok := tn.childMap[c]; ok {
        return node
    }
    tn.childMap[c] = &TrieNode{End: false}
    return tn.childMap[c]
}

func (st *SensitiveTrie) AddWord(word string) {
    node := st.root
    for _, r := range []rune(word) {
        node = node.AddChild(r)
    }
    node.End = true
    node.Data = word
}

By converting each word to a rune slice, we insert characters one by one, marking the final node with End=true and storing the full word.

Matching and Replacing

func (tn *TrieNode) FindChild(c rune) *TrieNode {
    if tn.childMap == nil {
        return nil
    }
    return tn.childMap[c]
}

func (st *SensitiveTrie) replaceRune(chars []rune, begin, end int) {
    for i := begin; i < end; i++ {
        chars[i] = st.replaceChar
    }
}

func (st *SensitiveTrie) Match(text string) (sensitiveWords []string, replaceText string) {
    if st.root == nil {
        return nil, text
    }
    chars := []rune(text)
    copyChars := make([]rune, len(chars))
    copy(copyChars, chars)
    for i := 0; i < len(chars); i++ {
        node := st.root.FindChild(chars[i])
        if node == nil {
            continue
        }
        j := i + 1
        for ; j < len(chars) && node != nil; j++ {
            if node.End {
                st.replaceRune(copyChars, i, j)
            }
            node = node.FindChild(chars[j])
        }
        if j == len(chars) && node != nil && node.End {
            st.replaceRune(copyChars, i, len(chars))
        }
    }
    // collect matched words (omitted for brevity)
    replaceText = string(copyChars)
    return
}

The algorithm walks the text rune by rune, follows the trie, and replaces matched segments with the configured asterisk.

Regular Expression Matching

sensitiveWords := []string{"傻逼", "傻叉", "垃圾", "妈的", "sb"}
text := "什么垃圾打野，傻逼一样，叫你来开龙不来，sb"
regStr := strings.Join(sensitiveWords, "|")
wordReg := regexp.MustCompile(regStr)
text = wordReg.ReplaceAllString(text, "*")
println("text ->", text)

This builds a single regex pattern like 傻逼|傻叉|垃圾|妈的|sb and replaces all matches with a single asterisk.

Special‑Character Filtering

func (st *SensitiveTrie) FilterSpecialChar(text string) string {
    text = strings.ToLower(text)
    text = strings.ReplaceAll(text, " ", "")
    re := regexp.MustCompile("[^\u4e00-\u9fa5a-zA-Z0-9]")
    return re.ReplaceAllString(text, "")
}

This keeps only Chinese characters, letters, and digits before matching.

Pinyin Support

// Convert Chinese words to pinyin using an external library
func HansCovertPinyin(words []string) []string {
    var out []string
    for _, w := range words {
        if !regexp.MustCompile("[\u4e00-\u9fa5]").MatchString(w) {
            continue
        }
        p := pinyin.New(w)
        s, _ := p.Convert()
        out = append(out, s)
    }
    return out
}

After obtaining pinyin strings (e.g., sha bi for 傻逼), we add them to the trie so that phonetic inputs are also filtered.

Full Demo

func main() {
    sensitiveWords := []string{"傻逼", "傻叉", "垃圾", "妈的", "sb"}
    matchContents := []string{
        "你是一个大傻逼，大傻叉",
        "你是傻☺叉",
        "shabi东西",
        "他made东西",
        "什么垃圾打野，傻逼一样，叫你来开龙不来，SB",
        "正常的内容☺",
    }
    fmt.Println("
--------- Prefix‑Tree Matching ---------")
    trieDemo(sensitiveWords, matchContents)
}

The output shows successful masking of Chinese, English, and pinyin variants, confirming the effectiveness of the trie‑based approach.

Conclusion

Adding a word to the trie creates a node for each character; if the node already exists, we reuse it. After inserting all characters, we mark the final node as End=true and store the full word. This structure enables fast, memory‑efficient sensitive‑word detection.

Source Code

Repository: gitee.com/huiDBK/sensitive-words-match

References

[1] https://gitee.com/huiDBK/sensitive-words-match

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Go regular expressions sensitive-word detection Trie Text Filtering pinyin rune

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.