fasta - Search and import multiple words from txt file on Biopython -
well, have fasta file has info protein in .txt , want search "string" comes after pattern , import it/write txt. comes this:
>gi|1168222|sp|p46098.1| ....(text)... >gi|74705987|sp|o95264.1| ....(text)...
and want accession numbers (acc): sp|**p46098**.1|
, save them in file in column. there different acc throughout text , want comes after sp|
, before .
or if doesn't have .
before next |
.
is there easy way of doing in biopython?
thanks
this answer uses biopython extent it's possible to, uses regular expressions rest (biopython id you, not accession number alone):
from bio import seqio import re open('output.txt', 'w') outfile: # open writing in seqio.parse('input.txt', 'fasta'): # parse fasta m = re.search('sp\|(.*)\|', i.id) # sp|.*| in id if m: outfile.write(m.group(1).split('.')[0] + '\n') # take what's before first dot, if
just note uninitiated: 'w'
overwrites existing file, while 'a'
appends instead.
also note using regular expression match on entire text (without using biopython parse out fasta ids first) return exact same result.