fasta - Search and import multiple words from txt file on Biopython -


well, have fasta file has info protein in .txt , want search "string" comes after pattern , import it/write txt. comes this:

>gi|1168222|sp|p46098.1| ....(text)... >gi|74705987|sp|o95264.1| ....(text)... 

and want accession numbers (acc): sp|**p46098**.1| , save them in file in column. there different acc throughout text , want comes after sp| , before . or if doesn't have . before next |.

is there easy way of doing in biopython?

thanks

this answer uses biopython extent it's possible to, uses regular expressions rest (biopython id you, not accession number alone):

from bio import seqio import re  open('output.txt', 'w') outfile: # open writing     in seqio.parse('input.txt', 'fasta'): # parse fasta         m = re.search('sp\|(.*)\|', i.id) # sp|.*| in id         if m:             outfile.write(m.group(1).split('.')[0] + '\n') # take what's before first dot, if 

just note uninitiated: 'w' overwrites existing file, while 'a' appends instead.

also note using regular expression match on entire text (without using biopython parse out fasta ids first) return exact same result.