String Algorithms — Complete Guide (KMP, Z, Rolling Hash, Suffix Array)
Advertisement
Why String Algorithms?
Naive pattern matching is O(nm). String algorithms exploit structure to achieve O(n+m) or O(n log n).
The 6 Core Patterns
Pattern 1 — KMP (Knuth-Morris-Pratt)
Build a failure function: for each position in the pattern, the length of the longest proper prefix that is also a suffix. Skip unnecessary comparisons during search.
Pattern 2 — Z-Algorithm
z[i] = length of the longest substring starting at s[i] that matches a prefix of s. Linear time, all prefix occurrences.
Pattern 3 — Rabin-Karp (Rolling Hash)
Hash each window. On mismatch, roll the hash in O(1). O(n+m) expected, useful for multiple pattern search.
Pattern 4 — Suffix Array
Sort all suffixes lexicographically. Enables O(m log n) substring search, LCP queries, and unique substring counting.
Pattern 5 — Manacher's Algorithm
Find the longest palindromic substring in O(n) by reusing previously computed palindrome centres.
Pattern 6 — String Hashing
Map strings to integers. Compare substrings in O(1) after O(n) preprocessing. Polynomial hash: h[i] = s[i] + p*h[i-1].
Complexity Reference
| Algorithm | Pre-process | Search | Space |
|---|---|---|---|
| Naive | O(1) | O(nm) | O(1) |
| KMP | O(m) | O(n) | O(m) |
| Z-algorithm | — | O(n+m) | O(n+m) |
| Rabin-Karp | O(m) | O(n) avg | O(1) |
| Suffix Array | O(n log n) | O(m log n) | O(n) |
| Manacher's | — | O(n) | O(n) |
Rolling Hash Template
MOD = 10**9+7
BASE = 31
def build_hash(s):
n = len(s)
h = [0]*(n+1)
p = [1]*(n+1)
for i in range(n):
h[i+1] = (h[i]*BASE + ord(s[i])-ord('a')+1) % MOD
p[i+1] = p[i]*BASE % MOD
def get(l, r):
return (h[r+1] - h[l]*p[r-l+1]) % MOD
return get
Advertisement