Implementing Sensitive Word Filtering with Trie Trees
This article explains how to use a trie (prefix tree) to efficiently filter sensitive words in a text, covering the basic concepts, construction steps, traversal algorithm, complexity analysis, and a Java implementation using HashMap.
Interview Scenario
During an interview, Xiao Qiu was asked about sensitive word filtering algorithms. The interviewer gave an example string "abcdefghi" and three sensitive words "de", "bca", and "bcf", and asked how to implement the filter.
Xiao Qiu initially suggested simple brute‑force string matching with O(n*m) complexity, then mentioned the KMP algorithm with O(n+m) time.
Trie Tree Overview
The interviewer introduced the trie (also called a prefix tree or dictionary tree). A trie shares common prefixes among strings, saving space. The root node stores no data; each branch represents a character, and a complete path corresponds to a full word.
For example, the strings "abc" and "abd" share the prefix "ab". Inserting "abf" extends the shared branch, while inserting "bc" creates a separate branch because it has no common prefix with the others.
When a node corresponds to the end of a word (e.g., "ab" is a prefix of "abc", "abd", and "abf"), the node can be marked to indicate a complete word.
Applying Trie to Sensitive Word Filtering
To filter the three sensitive words, we first build a trie containing "de", "bca", and "bcf".
We then traverse the target string using three pointers: p1 points to the current trie node, while p2 and p3 mark the start and end of a potential match in the text.
Step‑by‑step, we move through the characters "a", "b", "c", "d", "e", …, checking whether the current character exists as a child of the node pointed to by p1. When a complete word is found (e.g., the node for "e" completes "de"), we replace the substring between p2 and p3 with asterisks.
The process repeats until the end of the text, ensuring all occurrences of the sensitive words are masked.
Complexity Analysis
If each sensitive word has length m and the input text has length n , the search phase runs in O(n·m) time. Building the trie for t sensitive words costs O(t·m). In practice, the trie is built once and reused many times, so the construction cost is often negligible.
Implementation Details
In Java, a common approach is to represent each trie node with a HashMap<Character, Node> , allowing O(1) lookup of child nodes and dynamic expansion of the branching factor.
Conclusion
The article introduced the trie data structure, demonstrated its use for efficient sensitive‑word filtering, discussed time complexity, and suggested a Java implementation using HashMap. Readers are encouraged to implement the algorithm and handle additional cases such as special characters.
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.