fasta - Search and import multiple words from txt file on Biopython -

well, have fasta file has info protein in .txt , want search "string" comes after pattern , import it/write txt. comes this:

>gi|1168222|sp|p46098.1| ....(text)... >gi|74705987|sp|o95264.1| ....(text)...

and want accession numbers (acc): sp|**p46098**.1| , save them in file in column. there different acc throughout text , want comes after sp| , before . or if doesn't have . before next |.

is there easy way of doing in biopython?

thanks

this answer uses biopython extent it's possible to, uses regular expressions rest (biopython id you, not accession number alone):

from bio import seqio import re  open('output.txt', 'w') outfile: # open writing     in seqio.parse('input.txt', 'fasta'): # parse fasta         m = re.search('sp\|(.*)\|', i.id) # sp|.*| in id         if m:             outfile.write(m.group(1).split('.')[0] + '\n') # take what's before first dot, if

just note uninitiated: 'w' overwrites existing file, while 'a' appends instead.

also note using regular expression match on entire text (without using biopython parse out fasta ids first) return exact same result.

Search This Blog

hj

fasta - Search and import multiple words from txt file on Biopython -

Popular posts from this blog

title2

debugging - Reference - What does this error mean in PHP? -

libreoffice base macro parametric open close get current form name -