python - find sequence in non tab delimited file -


today encountered again problem.

i have file looking like:

file a

>chr1 acgactgactgtcgatcgatcgatgctcgatgctcgacgatcgtgctcgatc >chr2 gtgacgcacacgtgctagcgctgatcgatcgtagctcagtcag >chr3 cagtcgtcgatcgtcgatcgtcg 

and on (basicly fasta file).

in other file have nice tab delimited informations read:

file b

chr2 0 * 2s3m5i2m1d3m * cactttttgtcta nm:i:6 

both files huge

i want write needs done, part have problem with:

if filed chr2 file b matches line >chr2 in file a, cactttttgtcta (fileb) in sequence of file (only in sequence in >chr2 region. next >chr different chromosome don't want search there).

to simplify let's : cacacgtgctag sequence in file a

i trying using dictionary file a, it's not feasible.

any suggestions?

something like:

for req in fileb:    (tag, pattern) = parseb(req)    tag_matched = false    filea = open(file_a_name)    line in filea:        if line.startswith('>'):            tag_matched = line[1:].startswith(tag)        elif tag_matched , (line.find(pattern) > -1)            do_whatever()     filea.close 

should job if can write parseb function.


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -