Advanced Matching Algorithms and Graph Data Structures: KMP, Rabin‑Karp, Boyer‑Moore, Trie, Double‑Array Trie, and AC Automaton
This article introduces common graph concepts and several advanced string‑matching algorithms—including Brute‑Force, Rabin‑Karp, KMP, Boyer‑Moore, AC automaton, Trie, and Double‑Array Trie—explaining their principles, implementations, complexity analyses, and typical application scenarios for search systems.
1. Introduction
The article builds on previous discussions of basic data structures and algorithms, focusing on graph theory and advanced string‑matching techniques that are essential for enterprise‑level search systems.
2. Graph Basics
A graph G(V, E) consists of a vertex set V and an edge set E . Edges may be directed or undirected, and can carry weights representing cost, distance, or time, which are useful for routing problems such as logistics.
2.1 Graph Representation
Two common representations are presented:
Adjacency Matrix – a two‑dimensional array A[u][v] where A[u][v]=true (or the edge weight) if an edge exists. While simple, its space complexity is Θ(|V|²) , making it inefficient for sparse graphs.
/**
* Description: 使用邻接矩阵的图表示法
* @author
pankun8
* @date 2021/11/11 15:41
*/
@Data
@NoArgsConstructor
public class Graph<T extends Node>{
private int n;
private T[] data;
private Boolean directed;
private int[][] matrix;
public Graph(T[] data , Boolean directed){
this.n = data.length;
this.data = data;
this.directed = directed;
matrix = new int[n][n];
}
public void addEdge(int v , int w , int value){
if((v >=0 && v < n) && (w >= 0 && w < n)){
if(hasEdge(v,w) == value){
return;
}
matrix[v][w] = value;
if(!this.directed){
matrix[w][v] = value;
}
n ++;
}
}
public int hasEdge(int v, int w){
if((v >=0 && v < n) && (w >= 0 && w < n)){
return matrix[v][w];
}
return 0;
}
public int stateTransfer(int index , int value){
int[] matrix = this.matrix[index];
for (int i = 0; i < matrix.length; i++) {
if(matrix[i] == value){
return i;
}
}
return Integer.MAX_VALUE;
}
}Adjacency List – a collection of linked lists where each vertex stores only its outgoing edges, offering O(|V|+|E|) space and better performance for sparse graphs.
3. String Matching Algorithms
3.1 Brute‑Force (BF)
Also known as naive matching, BF checks every possible alignment of the pattern P in the text S , resulting in a worst‑case time complexity of O(m·n) .
3.2 Rabin‑Karp (RK)
RK hashes the pattern once and then slides a window over the text, comparing hash values to achieve an average time complexity of O(n) while maintaining O(m) preprocessing.
3.3 Knuth‑Morris‑Pratt (KMP)
KMP builds a next array (also called failure function) that records the length of the longest proper prefix which is also a suffix, allowing the algorithm to skip re‑examining characters and run in O(m+n) time.
public static void genNext(Integer[] next , String p){
int j = 0 , k = -1;
char[] chars = p.toCharArray();
next[0] = -1;
while(j < p.length() - 1){
if(k == -1 || chars[j] == chars[k]){
j++;k++;
next[j] = k;
}else{
k = next[k];
}
}
}3.4 Boyer‑Moore (BM)
BM scans the pattern from right to left and uses two heuristics – the bad‑character rule and the good‑suffix rule – to jump ahead, often outperforming KMP by a factor of 3‑4.
3.5 Trie (Prefix Tree)
A Trie stores strings in a tree where each node represents a character; common prefixes share the same path, enabling fast prefix queries with typical search time O(m) .
3.6 Aho‑Corasick (AC) Automaton
AC combines a Trie with failure links (similar to KMP’s next ) to perform multi‑pattern matching in linear time O(n) .
3.7 Double‑Array Trie
The Double‑Array Trie compresses a Trie into two parallel arrays BASE and CHECK , preserving fast lookup while drastically reducing memory usage.
4. Summary
The article completes the series on data structures for search by covering non‑linear structures (graphs) and a suite of string‑matching algorithms, laying the groundwork for deeper exploration of search engine technologies.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.