python - find sequence in non tab delimited file -
today encountered again problem.
i have file looking like:
file a
>chr1 acgactgactgtcgatcgatcgatgctcgatgctcgacgatcgtgctcgatc >chr2 gtgacgcacacgtgctagcgctgatcgatcgtagctcagtcag >chr3 cagtcgtcgatcgtcgatcgtcg
and on (basicly fasta file).
in other file have nice tab delimited informations read:
file b
chr2 0 * 2s3m5i2m1d3m * cactttttgtcta nm:i:6
both files huge
i want write needs done, part have problem with:
if filed chr2 file b matches line >chr2 in file a, cactttttgtcta (fileb) in sequence of file (only in sequence in >chr2 region. next >chr different chromosome don't want search there).
to simplify let's : cacacgtgctag sequence in file a
i trying using dictionary file a, it's not feasible.
any suggestions?
something like:
for req in fileb: (tag, pattern) = parseb(req) tag_matched = false filea = open(file_a_name) line in filea: if line.startswith('>'): tag_matched = line[1:].startswith(tag) elif tag_matched , (line.find(pattern) > -1) do_whatever() filea.close
should job if can write parseb function.
Comments
Post a Comment