Fundamentals 6 min read

Optimized String Matching: Boyer‑Moore Algorithm with Bad‑Character and Good‑Suffix Rules

This article explains the Boyer‑Moore string‑matching algorithm, detailing how the bad‑character and good‑suffix heuristics dramatically reduce comparisons, and provides a complete Java implementation with step‑by‑step illustrations of each matching round and the underlying shift calculations.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Optimized String Matching: Boyer‑Moore Algorithm with Bad‑Character and Good‑Suffix Rules

String matching is one of the most widely used functions in computer science, and many clever algorithms have been invented to improve its performance.

The article first recalls the brute‑force (BF) algorithm, which compares the pattern and the text character by character from left to right, illustrating each comparison round with diagrams.

It then introduces the Boyer‑Moore (BM) algorithm, which scans the pattern from right to left and uses two heuristics—bad‑character and good‑suffix—to decide how far to shift the pattern after a mismatch.

Bad‑character rule : when a mismatch occurs, the pattern is shifted so that the mismatched character in the text aligns with the rightmost occurrence of the same character in the pattern; if the character does not appear in the pattern, the shift equals the pattern length plus one.

The following Java method implements the bad‑character lookup:

private static int findCharacter(String pattern, char badCharacter, int index) {
    for (int i = index - 1; i >= 0; i--) {
        if (pattern.charAt(i) == badCharacter) {
            return i;
        }
    }
    // pattern does not contain the character, return -1
    return -1;
}

The main Boyer‑Moore search routine uses this helper to compute the shift:

public static int boyerMoore(String str, String pattern) {
    int strLength = str.length();
    int patternLength = pattern.length();
    int start = 0;
    while (start <= strLength - patternLength) {
        int i;
        for (i = patternLength - 1; i >= 0; i--) {
            if (str.charAt(start + i) != pattern.charAt(i)) {
                break;
            }
        }
        if (i < 0) {
            return start; // match found
        }
        int charIndex = findCharacter(pattern, str.charAt(start + i), i);
        int bcOffset = (charIndex >= 0) ? i - charIndex : i + 1;
        start += bcOffset;
    }
    return -1; // no match
}

public static void main(String[] args) {
    String str = "GTTATAGCTGGTAGCGGCGAA";
    String pattern = "GTAGCGGCG";
    int index = boyerMoore(str, pattern);
    System.out.println("首次出现位置:" + index);
}

Good‑suffix rule : after a mismatch, if the suffix of the pattern that matched the text also appears elsewhere in the pattern, the pattern can be shifted to align this other occurrence with the matched suffix, potentially moving it farther than the bad‑character rule allows.

The article illustrates the good‑suffix heuristic with several examples, showing how the pattern can be shifted multiple positions when a matching suffix like "GCG" exists earlier in the pattern.

By combining both heuristics, the Boyer‑Moore algorithm achieves sub‑linear average‑case performance, making it far more efficient than the brute‑force approach for large texts.

Javaalgorithmstring matchingbad character ruleBoyer-Mooregood suffix rule
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.