Repeated DNA Sequences — Rolling Hash
Advertisement
Problem
Find all 10-letter DNA substrings that appear more than once.
Approach — Rolling Hash with Bit Ops
Encode each nucleotide as 2 bits (A=0, C=1, G=2, T=3). Use 20-bit window, roll hash by shifting.
Time: O(n) | Space: O(n)
Solutions
Python — Set approach
class Solution:
def findRepeatedDnaSequences(self, s: str) -> list[str]:
seen=set(); result=set()
for i in range(len(s)-9):
sub=s[i:i+10]
if sub in seen: result.add(sub)
seen.add(sub)
return list(result)
Python — Rolling hash
class Solution:
def findRepeatedDnaSequences(self, s: str) -> list[str]:
enc={'A':0,'C':1,'G':2,'T':3}
seen=set(); result=set(); h=0; mask=(1<<20)-1
for i,c in enumerate(s):
h=((h<<2)|enc[c])&mask
if i>=9:
if h in seen: result.add(s[i-9:i+1])
seen.add(h)
return list(result)
Complexity
- Time: O(n) | Space: O(n)
Advertisement