Longest Duplicate Substring [Hard] — Binary Search + Rolling Hash

Sanjeev SharmaSanjeev Sharma
2 min read

Advertisement

Problem Statement

Given a string s, return any longest substring that appears at least twice. Empty string if none.

Input: "banana"Output: "ana"

Intuition

Binary search on the answer length L. For each L, use Rabin-Karp rolling hash to check if any substring of length L repeats.


Solutions

Python

def longestDupSubstring(s: str) -> str:
    n = len(s)
    nums = [ord(c) - ord('a') for c in s]
    MOD = (1 << 61) - 1  # Mersenne prime
    BASE = 31

    def search(length):
        h = 0
        power = pow(BASE, length, MOD)
        seen = set()
        for i in range(length):
            h = (h * BASE + nums[i]) % MOD
        seen.add(h)
        for i in range(1, n - length + 1):
            h = (h * BASE - nums[i-1] * power + nums[i+length-1]) % MOD
            if h in seen:
                return i
            seen.add(h)
        return -1

    lo, hi, start = 1, n-1, -1
    while lo <= hi:
        mid = (lo + hi) // 2
        idx = search(mid)
        if idx != -1:
            start = idx
            best_len = mid
            lo = mid + 1
        else:
            hi = mid - 1

    return '' if start == -1 else s[start:start+best_len]

Complexity

  • Time: O(n log n) average
  • Space: O(n)

Note

Suffix arrays give O(n log n) worst-case guaranteed.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro