Fundamentals 9 min read

Implementing Sensitive Word Filtering with Trie Trees

This article explains how to use a trie (prefix tree) to efficiently filter sensitive words in a text, covering the basic concepts, construction steps, traversal algorithm, complexity analysis, and a Java implementation using HashMap.

Java Captain

May 13, 2019

Implementing Sensitive Word Filtering with Trie Trees

Interview Scenario

During an interview, Xiao Qiu was asked about sensitive word filtering algorithms. The interviewer gave an example string "abcdefghi" and three sensitive words "de", "bca", and "bcf", and asked how to implement the filter.

Xiao Qiu initially suggested simple brute‑force string matching with O(n*m) complexity, then mentioned the KMP algorithm with O(n+m) time.

Trie Tree Overview

The interviewer introduced the trie (also called a prefix tree or dictionary tree). A trie shares common prefixes among strings, saving space. The root node stores no data; each branch represents a character, and a complete path corresponds to a full word.

For example, the strings "abc" and "abd" share the prefix "ab". Inserting "abf" extends the shared branch, while inserting "bc" creates a separate branch because it has no common prefix with the others.

When a node corresponds to the end of a word (e.g., "ab" is a prefix of "abc", "abd", and "abf"), the node can be marked to indicate a complete word.

Applying Trie to Sensitive Word Filtering

To filter the three sensitive words, we first build a trie containing "de", "bca", and "bcf".

We then traverse the target string using three pointers: p1 points to the current trie node, while p2 and p3 mark the start and end of a potential match in the text.

Step‑by‑step, we move through the characters "a", "b", "c", "d", "e", …, checking whether the current character exists as a child of the node pointed to by p1. When a complete word is found (e.g., the node for "e" completes "de"), we replace the substring between p2 and p3 with asterisks.

The process repeats until the end of the text, ensuring all occurrences of the sensitive words are masked.

Complexity Analysis

If each sensitive word has length m and the input text has length n, the search phase runs in O(n·m) time. Building the trie for t sensitive words costs O(t·m). In practice, the trie is built once and reused many times, so the construction cost is often negligible.

Implementation Details

In Java, a common approach is to represent each trie node with a HashMap<Character, Node>, allowing O(1) lookup of child nodes and dynamic expansion of the branching factor.

Conclusion

The article introduced the trie data structure, demonstrated its use for efficient sensitive‑word filtering, discussed time complexity, and suggested a Java implementation using HashMap. Readers are encouraged to implement the algorithm and handle additional cases such as special characters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Data Structure Trie string-matching Sensitive Word Filtering

Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.