KMP String Matching Algorithm: Theory, Implementation, and Comparison with Other Search Methods
This article explains the KMP string‑matching algorithm, how it builds the prefix (next) table to avoid the O(M·N) worst‑case of naive search, provides JavaScript implementations, compares it with Boyer‑Moore and V8's indexOf strategy, and includes detailed code examples.
When discussing common string‑matching algorithms, the naive (brute‑force) method and the more sophisticated KMP algorithm are usually mentioned.
For example, to find the pattern ABCDABE in the text ABCDABCDEF , a brute‑force approach compares characters sequentially, moving both i and j pointers forward when they match.
The worst‑case time complexity of this naive method is O(M·N), which becomes unacceptable because binary data often causes mismatches only near the end of the pattern.
KMP solves this by keeping the i pointer fixed on the master string M and moving the pattern pointer j to a position k derived from previously matched prefixes, eliminating unnecessary re‑comparisons.
The key property is that when a mismatch occurs, the next position k satisfies P[0..k‑1] == P[j‑k..j‑1] . This leads to the construction of a next (prefix) array that tells where to jump in the pattern.
Using the prefix table, KMP can achieve linear time O(M+N). The article shows how to compute the next array:
function compute_prefix(p, next) {
next[0] = -1; // initialization
let j = 0;
let k = -1;
while (j < p.length - 1) {
if (k == -1 || p[j] == p[k]) {
next[++j] = ++k; // update next
} else {
k = next[k];
}
}
}Comments explain why next[0] = -1 and how the algorithm handles the case when p[j] == p[k] by moving k forward.
An improved version avoids the “next‑value” pitfall when P[j] == P[next[j]] :
function compute_prefix(p, next) {
next[0] = -1; let j = 0; let k = -1;
while (j < p.length - 1) {
if (k == -1 || p[j] == p[k]) {
if (p[++j] == p[++k]) {
next[j] = next[k];
} else {
next[j] = k;
}
} else {
k = next[k];
}
}
}The complete KMP search uses this table:
function KMP(M, P) {
let i = 0, j = 0;
let next = [];
compute_prefix(P, next);
while (i < M.length && j < P.length) {
if (j == -1 || M[i] == P[j]) {
i++; j++;
} else {
j = next[j];
}
}
if (j == P.length) return i - j; else return -1;
}Although KMP is efficient, the Boyer‑Moore (BM) algorithm often outperforms it, especially for longer patterns, achieving 3‑5× speedups in practice (e.g., GNU grep, text editors).
The article also examines how V8 implements String.prototype.indexOf . In src/string-search.h , five search strategies are defined: SingleCharSearch , LinearSearch , InitialSearch , BoyerMooreHorspoolSearch , and BoyerMooreSearch . The engine selects the most suitable algorithm based on the pattern length.
StringSearch(Isolate* isolate, Vector
pattern)
: isolate_(isolate), pattern_(pattern), start_(Max(0, pattern.length() - kBMMaxShift)) {
if (sizeof(PatternChar) > sizeof(SubjectChar)) {
if (!IsOneByteString(pattern_)) { strategy_ = &FailSearch return; }
}
int pattern_length = pattern_.length();
if (pattern_length < kBMMinPatternLength) {
if (pattern_length == 1) { strategy_ = &SingleCharSearch return; }
strategy_ = &LinearSearch return;
}
strategy_ = &InitialSearch
}Thus, indexOf dynamically chooses the fastest matching algorithm, combining the strengths of several classic methods.
Overall, the article provides a thorough walkthrough of KMP, its prefix table construction, JavaScript implementation, and its place among other string‑search techniques.
Beike Product & Technology
As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.