String Algorithms — Complete Guide (KMP, Z, Rolling Hash, Suffix Array)

Why String Algorithms?

Naive pattern matching is O(nm). String algorithms exploit structure to achieve O(n+m) or O(n log n).

The 6 Core Patterns

Pattern 1 — KMP (Knuth-Morris-Pratt)

Build a failure function: for each position in the pattern, the length of the longest proper prefix that is also a suffix. Skip unnecessary comparisons during search.

Pattern 2 — Z-Algorithm

z[i] = length of the longest substring starting at s[i] that matches a prefix of s. Linear time, all prefix occurrences.

Pattern 3 — Rabin-Karp (Rolling Hash)

Hash each window. On mismatch, roll the hash in O(1). O(n+m) expected, useful for multiple pattern search.

Pattern 4 — Suffix Array

Sort all suffixes lexicographically. Enables O(m log n) substring search, LCP queries, and unique substring counting.

Pattern 5 — Manacher's Algorithm

Find the longest palindromic substring in O(n) by reusing previously computed palindrome centres.

Pattern 6 — String Hashing

Map strings to integers. Compare substrings in O(1) after O(n) preprocessing. Polynomial hash: h[i] = s[i] + p*h[i-1].

Complexity Reference

Algorithm	Pre-process	Search	Space
Naive	O(1)	O(nm)	O(1)
KMP	O(m)	O(n)	O(m)
Z-algorithm	—	O(n+m)	O(n+m)
Rabin-Karp	O(m)	O(n) avg	O(1)
Suffix Array	O(n log n)	O(m log n)	O(n)
Manacher's	—	O(n)	O(n)

Rolling Hash Template

MOD = 10**9+7
BASE = 31

def build_hash(s):
    n = len(s)
    h = [0]*(n+1)
    p = [1]*(n+1)
    for i in range(n):
        h[i+1] = (h[i]*BASE + ord(s[i])-ord('a')+1) % MOD
        p[i+1] = p[i]*BASE % MOD
    def get(l, r):
        return (h[r+1] - h[l]*p[r-l+1]) % MOD
    return get