Skip to content

Instantly share code, notes, and snippets.

@senvey
Last active August 29, 2015 14:06
Show Gist options
  • Save senvey/75d4188922a3f8061804 to your computer and use it in GitHub Desktop.
Save senvey/75d4188922a3f8061804 to your computer and use it in GitHub Desktop.
[pika] Clump Finding Problem
"""
Given integers L and t, a string Pattern forms an (L, t)-clump inside a (larger) string Genome if there is an interval of Genome of length L in which Pattern appears at least t times. For example, TGCA forms a (25,3)-clump in the following Genome: gatcagcataagggtcccTGCAaTGCAtgacaagccTGCAgttgttttac.
Clump Finding Problem
Find patterns forming clumps in a string.
Given: A string Genome, and integers k, L, and t.
Return: All distinct k-mers forming (L, t)-clumps in Genome.
"""
from collections import defaultdict
def search(inseq, k, L, t):
lookup = defaultdict(list)
result = set()
for cursor in range(len(inseq) - k + 1):
seg = inseq[cursor:cursor + k]
# remove prior positions of the same segment
# if they are more than L distance far
while lookup[seg] and cursor + k - lookup[seg][0] > L:
lookup[seg].pop(0)
lookup[seg].append(cursor)
if len(lookup[seg]) == t:
result.add(seg)
return result
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment