Fundamentals 13 min read

Performance Comparison of String Replacement Algorithms in Java

The article analyzes various Java string‑replacement techniques—including simple String.replace, compiled regular expressions, Aho‑Corasick automaton, and custom Trie implementations—by presenting their designs, object sizes, and benchmark results to guide developers in choosing the most efficient solution for large keyword sets.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Performance Comparison of String Replacement Algorithms in Java

Background : In many e‑commerce scenarios a product name must have a large set of keywords removed, which makes naive String.replace loops CPU‑intensive and unsuitable for production.

Solution Overview : Four replacement strategies are implemented and compared: a straightforward String.replace loop, a pre‑compiled regular‑expression approach, an Aho‑Corasick (AC automaton) library, and a hand‑written Trie‑based replacer.

Interface Definition :

public interface Replacer {
    String replaceKeywords(String text);
}

2.1 String.replace Implementation :

public class StrReplacer implements Replacer {
    private final List
keyWordList;
    public StrReplacer(String keyWords) {
        this.keyWordList = Lists.newArrayList(keyWords.split(";"));
        keyWordList.sort((a, b) -> Integer.compare(b.length(), a.length()));
    }
    @Override
    public String replaceKeywords(String text) {
        String newTxt = text;
        for (String s : keyWordList) {
            newTxt = newTxt.replace(s, "");
        }
        return newTxt;
    }
}

2.2 Regular‑Expression Implementation (uses compiled Pattern for each keyword):

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL)
                  .matcher(this)
                  .replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

2.3 Aho‑Corasick (AC automaton) Implementation (uses the org.ahocorasick library):

public class AhoCorasickReplacer implements Replacer {
    private final Trie trie;
    public AhoCorasickReplacer(String keyWords) {
        Trie.TrieBuilder builder = Trie.builder().ignoreOverlaps().onlyWholeWords();
        for (String s : keyWords.split(";")) {
            builder.addKeyword(s);
        }
        this.trie = builder.build();
    }
    @Override
    public String replaceKeywords(String text) {
        if (text == null || text.isEmpty()) return text;
        StringBuilder result = new StringBuilder();
        Collection
emits = trie.parseText(text);
        int lastEnd = 0;
        for (Emit emit : emits) {
            int start = emit.getStart();
            int end = emit.getEnd();
            if (start > lastEnd) result.append(text, lastEnd, start);
            lastEnd = end + 1;
        }
        if (lastEnd <= text.length() - 1) result.append(text.substring(lastEnd));
        return result.toString();
    }
}

2.4 Custom Trie Implementation (only performs replacement):

public class TrieKeywordReplacer implements Replacer {
    private final Trie trie;
    @Override
    public String replaceKeywords(String text) {
        return trie.replaceKeywords(text, "");
    }
    public TrieKeywordReplacer(String keyWords) {
        Trie trie = new Trie();
        for (String s : keyWords.split(";")) {
            trie.insert(s);
        }
        this.trie = trie;
    }
    static class TrieNode {
        Map
children = new HashMap<>();
        boolean isEndOfWord = false;
    }
    static class Trie {
        private TrieNode root = new TrieNode();
        void insert(String word) {
            TrieNode node = root;
            for (char c : word.toCharArray()) {
                node.children.computeIfAbsent(c, k -> new TrieNode());
                node = node.children.get(c);
            }
            node.isEndOfWord = true;
        }
        String replaceKeywords(String text, String replacement) {
            StringBuilder result = new StringBuilder();
            int i = 0;
            while (i < text.length()) {
                TrieNode node = root;
                int j = i;
                TrieNode endNode = null;
                int endIndex = -1;
                while (j < text.length() && node.children.containsKey(text.charAt(j))) {
                    node = node.children.get(text.charAt(j));
                    if (node.isEndOfWord) { endNode = node; endIndex = j; }
                    j++;
                }
                if (endNode != null) {
                    result.append(replacement);
                    i = endIndex + 1;
                } else {
                    result.append(text.charAt(i));
                    i++;
                }
            }
            return result.toString();
        }
    }
}

Object Size Comparison (measured with ObjectSizeCalculator ):

StrReplacer – 12,560 bytes

PatternReplacer – 21,592 bytes

TrieKeywordReplacer – 184,944 bytes

AhoCorasickReplacer – 253,896 bytes

Performance Benchmarks (single‑thread and two‑thread scenarios, 10,000 iterations, JDK 1.8, 400 keywords):

Implementation

Single‑Thread (ns)

Two‑Thread (ns)

StrReplacer

21,843

23,444

PatternReplacer

28,846

39,984

TrieKeywordReplacer

532

680

AhoCorasickReplacer

727

1,157

The custom Trie implementation shows the best raw replacement speed because it does only the minimal work, followed by Aho‑Corasick which also provides precise match locations and counts. Pre‑compiled regex is faster than naive String.replace but slower than the automaton‑based solutions.

Conclusion : For large keyword sets, using a ready‑made Aho‑Corasick library offers the best balance of performance and functionality, while a hand‑crafted Trie can be the fastest if only simple replacement is required. A preliminary Bloom‑filter check for common substrings (e.g., the character "补") can further reduce unnecessary processing.

Sample usage with a pre‑check:

public String replaceKeywords(String skuName) {
    Replacer replacer = new AhoCorasickReplacer(keyWords);
    if (skuName.contains("补")) {
        return replacer.replaceKeywords(skuName);
    } else {
        return skuName;
    }
}
JavaPerformanceAho-CorasickTriestring replacement
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.