Performance Comparison of String Replacement Algorithms in Java
The article analyzes various Java string‑replacement techniques—including simple String.replace, compiled regular expressions, Aho‑Corasick automaton, and custom Trie implementations—by presenting their designs, object sizes, and benchmark results to guide developers in choosing the most efficient solution for large keyword sets.
Background : In many e‑commerce scenarios a product name must have a large set of keywords removed, which makes naive String.replace loops CPU‑intensive and unsuitable for production.
Solution Overview : Four replacement strategies are implemented and compared: a straightforward String.replace loop, a pre‑compiled regular‑expression approach, an Aho‑Corasick (AC automaton) library, and a hand‑written Trie‑based replacer.
Interface Definition :
public interface Replacer {
String replaceKeywords(String text);
}2.1 String.replace Implementation :
public class StrReplacer implements Replacer {
private final List
keyWordList;
public StrReplacer(String keyWords) {
this.keyWordList = Lists.newArrayList(keyWords.split(";"));
keyWordList.sort((a, b) -> Integer.compare(b.length(), a.length()));
}
@Override
public String replaceKeywords(String text) {
String newTxt = text;
for (String s : keyWordList) {
newTxt = newTxt.replace(s, "");
}
return newTxt;
}
}2.2 Regular‑Expression Implementation (uses compiled Pattern for each keyword):
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL)
.matcher(this)
.replaceAll(Matcher.quoteReplacement(replacement.toString()));
}2.3 Aho‑Corasick (AC automaton) Implementation (uses the org.ahocorasick library):
public class AhoCorasickReplacer implements Replacer {
private final Trie trie;
public AhoCorasickReplacer(String keyWords) {
Trie.TrieBuilder builder = Trie.builder().ignoreOverlaps().onlyWholeWords();
for (String s : keyWords.split(";")) {
builder.addKeyword(s);
}
this.trie = builder.build();
}
@Override
public String replaceKeywords(String text) {
if (text == null || text.isEmpty()) return text;
StringBuilder result = new StringBuilder();
Collection
emits = trie.parseText(text);
int lastEnd = 0;
for (Emit emit : emits) {
int start = emit.getStart();
int end = emit.getEnd();
if (start > lastEnd) result.append(text, lastEnd, start);
lastEnd = end + 1;
}
if (lastEnd <= text.length() - 1) result.append(text.substring(lastEnd));
return result.toString();
}
}2.4 Custom Trie Implementation (only performs replacement):
public class TrieKeywordReplacer implements Replacer {
private final Trie trie;
@Override
public String replaceKeywords(String text) {
return trie.replaceKeywords(text, "");
}
public TrieKeywordReplacer(String keyWords) {
Trie trie = new Trie();
for (String s : keyWords.split(";")) {
trie.insert(s);
}
this.trie = trie;
}
static class TrieNode {
Map
children = new HashMap<>();
boolean isEndOfWord = false;
}
static class Trie {
private TrieNode root = new TrieNode();
void insert(String word) {
TrieNode node = root;
for (char c : word.toCharArray()) {
node.children.computeIfAbsent(c, k -> new TrieNode());
node = node.children.get(c);
}
node.isEndOfWord = true;
}
String replaceKeywords(String text, String replacement) {
StringBuilder result = new StringBuilder();
int i = 0;
while (i < text.length()) {
TrieNode node = root;
int j = i;
TrieNode endNode = null;
int endIndex = -1;
while (j < text.length() && node.children.containsKey(text.charAt(j))) {
node = node.children.get(text.charAt(j));
if (node.isEndOfWord) { endNode = node; endIndex = j; }
j++;
}
if (endNode != null) {
result.append(replacement);
i = endIndex + 1;
} else {
result.append(text.charAt(i));
i++;
}
}
return result.toString();
}
}
}Object Size Comparison (measured with ObjectSizeCalculator ):
StrReplacer – 12,560 bytes
PatternReplacer – 21,592 bytes
TrieKeywordReplacer – 184,944 bytes
AhoCorasickReplacer – 253,896 bytes
Performance Benchmarks (single‑thread and two‑thread scenarios, 10,000 iterations, JDK 1.8, 400 keywords):
Implementation
Single‑Thread (ns)
Two‑Thread (ns)
StrReplacer
21,843
23,444
PatternReplacer
28,846
39,984
TrieKeywordReplacer
532
680
AhoCorasickReplacer
727
1,157
The custom Trie implementation shows the best raw replacement speed because it does only the minimal work, followed by Aho‑Corasick which also provides precise match locations and counts. Pre‑compiled regex is faster than naive String.replace but slower than the automaton‑based solutions.
Conclusion : For large keyword sets, using a ready‑made Aho‑Corasick library offers the best balance of performance and functionality, while a hand‑crafted Trie can be the fastest if only simple replacement is required. A preliminary Bloom‑filter check for common substrings (e.g., the character "补") can further reduce unnecessary processing.
Sample usage with a pre‑check:
public String replaceKeywords(String skuName) {
Replacer replacer = new AhoCorasickReplacer(keyWords);
if (skuName.contains("补")) {
return replacer.replaceKeywords(skuName);
} else {
return skuName;
}
}JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.