Fundamentals 6 min read

Optimized String Matching: Boyer‑Moore Algorithm with Bad‑Character and Good‑Suffix Rules

This article explains the Boyer‑Moore string‑matching algorithm, detailing how the bad‑character and good‑suffix heuristics dramatically reduce comparisons, and provides a complete Java implementation with step‑by‑step illustrations of each matching round and the underlying shift calculations.

Sohu Tech Products

Apr 15, 2020

Optimized String Matching: Boyer‑Moore Algorithm with Bad‑Character and Good‑Suffix Rules

String matching is one of the most widely used functions in computer science, and many clever algorithms have been invented to improve its performance.

The article first recalls the brute‑force (BF) algorithm, which compares the pattern and the text character by character from left to right, illustrating each comparison round with diagrams.

It then introduces the Boyer‑Moore (BM) algorithm, which scans the pattern from right to left and uses two heuristics—bad‑character and good‑suffix—to decide how far to shift the pattern after a mismatch.

Bad‑character rule : when a mismatch occurs, the pattern is shifted so that the mismatched character in the text aligns with the rightmost occurrence of the same character in the pattern; if the character does not appear in the pattern, the shift equals the pattern length plus one.

The following Java method implements the bad‑character lookup:

private static int findCharacter(String pattern, char badCharacter, int index) {
    for (int i = index - 1; i >= 0; i--) {
        if (pattern.charAt(i) == badCharacter) {
            return i;
        }
    }
    // pattern does not contain the character, return -1
    return -1;
}

The main Boyer‑Moore search routine uses this helper to compute the shift:

public static int boyerMoore(String str, String pattern) {
    int strLength = str.length();
    int patternLength = pattern.length();
    int start = 0;
    while (start <= strLength - patternLength) {
        int i;
        for (i = patternLength - 1; i >= 0; i--) {
            if (str.charAt(start + i) != pattern.charAt(i)) {
                break;
            }
        }
        if (i < 0) {
            return start; // match found
        }
        int charIndex = findCharacter(pattern, str.charAt(start + i), i);
        int bcOffset = (charIndex >= 0) ? i - charIndex : i + 1;
        start += bcOffset;
    }
    return -1; // no match
}

public static void main(String[] args) {
    String str = "GTTATAGCTGGTAGCGGCGAA";
    String pattern = "GTAGCGGCG";
    int index = boyerMoore(str, pattern);
    System.out.println("首次出现位置：" + index);
}

Good‑suffix rule : after a mismatch, if the suffix of the pattern that matched the text also appears elsewhere in the pattern, the pattern can be shifted to align this other occurrence with the matched suffix, potentially moving it farther than the bad‑character rule allows.

The article illustrates the good‑suffix heuristic with several examples, showing how the pattern can be shifted multiple positions when a matching suffix like "GCG" exists earlier in the pattern.

By combining both heuristics, the Boyer‑Moore algorithm achieves sub‑linear average‑case performance, making it far more efficient than the brute‑force approach for large texts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java algorithm string-matching bad character rule Boyer-Moore good suffix rule

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.