Aho-Corasick — Multi-Pattern String Matching
Advertisement
Aho-Corasick Overview
Build a trie of all patterns. Add failure links (like KMP failure function but on the trie). Then stream the text through the automaton — each character transition is O(1) amortised.
Solution
Python
from collections import deque
class AhoCorasick:
def __init__(self, patterns):
self.goto = [{}]
self.fail = [0]
self.output = [[]]
for i, p in enumerate(patterns):
self._insert(p, i)
self._build_fail()
def _insert(self, pattern, idx):
node = 0
for c in pattern:
if c not in self.goto[node]:
self.goto[node][c] = len(self.goto)
self.goto.append({})
self.fail.append(0)
self.output.append([])
node = self.goto[node][c]
self.output[node].append(idx)
def _build_fail(self):
q = deque()
for c, s in self.goto[0].items():
self.fail[s] = 0
q.append(s)
while q:
r = q.popleft()
for c, s in self.goto[r].items():
q.append(s)
state = self.fail[r]
while state and c not in self.goto[state]:
state = self.fail[state]
self.fail[s] = self.goto[state].get(c, 0)
if self.fail[s] == s: self.fail[s] = 0
self.output[s] += self.output[self.fail[s]]
def search(self, text):
state = 0
results = []
for i, c in enumerate(text):
while state and c not in self.goto[state]:
state = self.fail[state]
state = self.goto[state].get(c, 0)
for pattern_idx in self.output[state]:
results.append((i, pattern_idx))
return results
JavaScript
/* Aho-Corasick: trie with BFS failure links, text streaming */
Java
/* Java: integer-indexed trie nodes, BFS failure link construction */
C++
/* C++: vector<array<int,26>> trie + fail[] + output lists */
C
/* C: static 2D trie array, BFS failure links, linear search pass */
Complexity
| Phase | Time |
|---|---|
| Build trie | O(sum of pattern lengths) |
| Build failure links | O(sum of pattern lengths) |
| Search text | O(n + number of matches) |
| Total | O(n + m + z) |
Where n=text length, m=total pattern length, z=matches found.
Key advantage over KMP: matches ALL patterns simultaneously.
Advertisement