Understand the working of KMP Algorithm

Understand the working of KMP Algorithm


Introduction

Ever needed to find a word while using a text editor? This situation is probably common, and you have probably experienced it frequently. Of course, you don't automatically scan the entire article; instead, you use the find function on your computer to have it do it for you.

Algorithms for "string matching" or "pattern matching" are applied in this situation. The KMP algorithm falls into the category of “pattern matching” algorithms.

We need to determine whether string a matches all or part of string b when we have two strings, a representing the pattern and b representing the string in which we want to find the pattern. In essence, we must determine whether string an is a component of string b.

From left to right, this algorithm compares each character individually. However, it uses a preprocessed table called the "Prefix Table" to avoid character comparison during matching whenever a mismatch occurs.

Working of KMP

String matching algorithms are another name for the pattern search algorithms. When searching a string within another string, these algorithms come in very handy. Write a programme with the function PatternSearch(char pat[, char str[]) that prints all instances of pat[] in str[ given a text string str[0..n-1] and a pattern pat[0..m-1]. Considering n > m

Examples:

Input: str[] = "BOARD INFINITY

pat[] = "INF"

Output: Pattern found at index 6.

Components of the KMP algorithm:

a. Prefix.

b. Suffix.

c. LPS table : Used to detect whether the Longest Proper Prefix is also a suffix.

Steps

LPS ← array [size = pattern length]
LPS[0] ← 0  {LPS value of the first element is always 0}
len ← 0  {length of previous longest proper prefix that is also a suffix}
i ← 1
m ← length of pattern
while i < m do
    if pattern[i] = pattern[len]:
        len ← len + 1
        LPS[i] ← len
        i ← i + 1
    else  {pattern[i] is not equal to pattern[len]}
        if len is not equal to 0:
            len ← LPS[len - 1]
        else  {if len is 0}
            LPS[i] ← 0
            i ← i + 1
return LPS

Code

void KMP(char* pat, char* str) {
    int M = strlen(pat);
    int N = strlen(str);
 
    int lps[M];
    computeLPS(pat, M, lps);
 
    int i = 0;
    int j = 0;
    while (i < N) {
        if (pat[j] == str[i]) {
            j++;
            i++;
        }
 
        if (j == M) {
            printf("Pattern exists from index %d ", i - j);
            j = lps[j - 1];
        }
        else if (i < N && pat[j] != str[i]) {
            if (j != 0)
                j = lps[j - 1];
            else
                i = i + 1;
        }
    }
}
 
void computeLPS(char* pat, int M, int* lps) {
        int len = 0;
 
    lps[0] = 0;  
      int i = 1;
    while (i < M) {
        if (pat[i] == pat[len]) {
            len++;
            lps[i] = len;
            i++;
        }
        else{
          if (len != 0) {
                len = lps[len - 1];
          }
            else             {
                lps[i] = 0;
                i++;
            }
        }
    }
}

write your code here: Coding Playground

Complexity Analysis

Since we have an outer and an inner loop suggests that time complexity would be constant time O(mn). However, if we count it in a different way, we can see that it is always less than that. The idea is that we perform one comparison T[i+j]==P[j] for each iteration of the inner loop. By keeping track of how many comparisons we make, we can determine how long the algorithm took overall.

Time complexity of the complete algorithm is O(m + n).