In this post, we’ll explore how to implement the Knuth-Morris-Pratt (KMP) string matching algorithm in Java. The KMP algorithm is an efficient way to search for a substring within a given string. It improves upon the brute-force approach by skipping over unnecessary comparisons after partial matches.
What is the Knuth-Morris-Pratt (KMP) Algorithm?
The KMP algorithm searches for a substring (pattern) in a larger string (text) efficiently by avoiding redundant comparisons. The key idea behind KMP is that when a mismatch happens, it uses previously gathered information about the pattern to shift the search window instead of starting from scratch.
The algorithm uses two main components:
- Failure Function (Prefix Table): It helps determine how much the pattern should shift when a mismatch occurs.
- Search Process: Once the failure function is computed, the algorithm searches through the text to find the pattern.
Java Code Implementation
import java.io.*;
class KnuthMorrisPratt {
// This function computes the failure function for the pattern
void computeFailureFunction(String pattern, int[] failure) {
int i = 1;
int j = 0;
failure[0] = 0;
int patternLength = pattern.length();
// Compute the failure function
while (i < patternLength) {
if (pattern.charAt(j) == pattern.charAt(i)) {
failure[i] = j + 1;
i++;
j++;
} else if (j > 0) {
j = failure[j - 1];
} else {
failure[i] = 0;
i++;
}
}
}
// This function uses the failure function to search for the pattern in the text
int KMP(String text, String pattern, int[] failure) {
computeFailureFunction(pattern, failure);
int textIndex = 0;
int patternIndex = 0;
int textLength = text.length();
int patternLength = pattern.length();
// Perform the search
while (textIndex < textLength) {
if (pattern.charAt(patternIndex) == text.charAt(textIndex)) {
if (patternIndex == patternLength - 1) {
return textIndex - patternLength + 1; // Match found, return the position
}
textIndex++;
patternIndex++;
} else if (patternIndex > 0) {
patternIndex = failure[patternIndex - 1]; // Shift pattern based on failure function
} else {
textIndex++;
}
}
return -1; // No match found
}
}
public class KMPStringSearch {
public static void main(String[] args) throws IOException {
// Input handling
InputStreamReader input = new InputStreamReader(System.in);
BufferedReader reader = new BufferedReader(input);
System.out.println("Enter the text string:");
String text = reader.readLine();
System.out.println("Enter the pattern string to be searched:");
String pattern = reader.readLine();
KnuthMorrisPratt kmpAlgorithm = new KnuthMorrisPratt();
int patternLength = pattern.length();
int[] failureFunction = new int[patternLength];
// Search for the pattern in the text
int matchPosition = kmpAlgorithm.KMP(text, pattern, failureFunction);
if (matchPosition != -1) {
System.out.println("Substring matched at position: " + matchPosition);
} else {
System.out.println("No substring match found.");
}
}
}
Explanation of the Code
- Failure Function Calculation:
The functioncomputeFailureFunction
is responsible for building the failure function (or prefix table). It compares each character of the pattern with the previous characters and stores the length of the longest proper prefix that is also a suffix for each index. This helps in skipping unnecessary comparisons during the actual search.- Example: For the pattern “ABAB”, the failure function would be
[0, 0, 1, 2]
. This tells us that when a mismatch occurs at index 3, we can skip to index 2 of the pattern (instead of starting over).
- Example: For the pattern “ABAB”, the failure function would be
- Search Process:
TheKMP
function uses the failure function to perform the actual search. It compares characters from the text and the pattern. If a mismatch occurs, it shifts the pattern based on the failure function (i.e., it skips over parts of the pattern that have already been matched). If the pattern matches completely, it returns the starting position of the match in the text.
Sample Output
Enter the text string:
piitnewpanvel
Enter the pattern string to be searched:
pan
Substring matched at position: 7
Step-by-Step Explanation of the Input/Output and Program flow
Input Strings:
- Text:
piitnewpanvel
- Pattern:
pan
We aim to locate the substring "pan"
within the text.
Failure Function for “pan”:
The program first computes the failure function (also known as the prefix table) for the pattern “pan”.
- At index 0 (
'p'
): No prefix-suffix match →0
- At index 1 (
'a'
): No prefix-suffix match →0
- At index 2 (
'n'
): No prefix-suffix match →0
Failure Table: [0, 0, 0]
This table helps decide how much to shift the pattern when a mismatch happens.
Search Process Overview:
- The algorithm starts comparing characters in the text and pattern from left to right.
- On mismatches, it uses the failure function to skip unnecessary checks.
- In this example, a match is found starting at text index 7 (0-based) where:
'p'
matches'p'
'a'
matches'a'
'n'
matches'n'
As a result, the program reports:
Substring matched at position: 7
can you send me the kmp program for column matching
Bad implementation. Bad function/variable namings. No comments. Please don’t refer this code.