Fundamentals 10 min read

Master the KMP String Matching Algorithm: Theory, Next Array, and Python Implementation

This article introduces the KMP string‑matching algorithm, explains its practical scenarios, details the construction and purpose of the Next array, provides a step‑by‑step Python implementation, and analyzes its linear‑time complexity for efficient pattern searching.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Master the KMP String Matching Algorithm: Theory, Next Array, and Python Implementation

Today marks the 29th article in the Algorithm and Data Structure series, focusing on a new string‑matching algorithm—KMP.

Application Scenarios

String matching is ubiquitous in computing, from searching keywords on a page to Git diffs and plagiarism detection. Simple cases can be handled by brute force, but large‑scale tasks like paper similarity checks require a more efficient approach, making KMP essential.

Next Array

The core of KMP is the Next array, which stores the longest proper prefix that is also a suffix for each position, allowing the algorithm to resume matching from a viable intermediate state instead of restarting from the beginning.

Consider the following example:

When a mismatch occurs after several matching characters, the failure position’s prefix (e.g., "ABC") can align with the pattern’s prefix, saving many unnecessary comparisons.

The Next array records these intermediate states, indicating where to jump when a mismatch happens.

Algorithm Principle

A pointer tracks the current position in the pattern; on mismatch it jumps to the index stored in the Next array, and on match it advances. With the Next array, the implementation becomes straightforward:

def kmp(var_str, template_str):
    # var_str is the text (A string)
    # template_str is the pattern (B string)
    var_str = '$' + var_str
    template_str = '$' + template_str
    next = generate_next(template_str)
    n, m = len(var_str), len(template_str)
    head = 0
    for i in range(1, n):
        while head > 0 and template_str[head+1] != var_str[i]:
            head = next[head]
        if template_str[head+1] == var_str[i]:
            head += 1
        if head == m - 1:
            return True
    return False

The while loop only decreases the head pointer, ensuring each character is processed at most twice, resulting in linear O(n) time.

Compute Next

To build the Next array, we iterate from position 2 onward, using the previously computed Next[i‑1] to find the longest proper prefix that matches the suffix ending at i.

def generate_next(var_str):
    n = len(var_str)
    next = [0 for _ in range(n)]
    for i in range(2, n):
        head = next[i-1]
        while head > 0 and var_str[head+1] != var_str[i]:
            head = next[head]
        if var_str[head+1] == var_str[i]:
            head += 1
        next[i] = head
    return next

Summary

KMP offers a clever linear‑time solution for string matching by leveraging the Next array to avoid redundant comparisons, making it ideal for large‑scale pattern searches such as plagiarism detection or fast text queries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

algorithmPythonKMPstring-matchingNext array
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.