fasta - Search and import multiple words from txt file on Biopython -


well, have fasta file has info protein in .txt , want search "string" comes after pattern , import it/write txt. comes this:

>gi|1168222|sp|p46098.1| ....(text)... >gi|74705987|sp|o95264.1| ....(text)... 

and want accession numbers (acc): sp|**p46098**.1| , save them in file in column. there different acc throughout text , want comes after sp| , before . or if doesn't have . before next |.

is there easy way of doing in biopython?

thanks

this answer uses biopython extent it's possible to, uses regular expressions rest (biopython id you, not accession number alone):

from bio import seqio import re  open('output.txt', 'w') outfile: # open writing     in seqio.parse('input.txt', 'fasta'): # parse fasta         m = re.search('sp\|(.*)\|', i.id) # sp|.*| in id         if m:             outfile.write(m.group(1).split('.')[0] + '\n') # take what's before first dot, if 

just note uninitiated: 'w' overwrites existing file, while 'a' appends instead.

also note using regular expression match on entire text (without using biopython parse out fasta ids first) return exact same result.


Popular posts from this blog