I have created a fasta file and a taxonomy file following the mothur format for the latest release of PR2 based (gb200). Is there anyway this could be posted on the mothur wiki for other people to use ? How to proceed ?
I tried to use the PR2 Reference database and download the files for mothur on the website however the taxonomy files seem to have a problem because when I try to open it there is undecipherable characters. Could you help me please.
A. Volant
postgraduate at Hydroscience Montpellier, France
It looks like there’s some encoding error in the mothur/QIIME taxonomy files. The gb203_pr2.tlf file looks fine though, so in the short term your best bet would be to reformat it into a mothur-compatible version.
Here’s a quick python script that will do the trick:
import sys
for line in open(sys.argv[1],'r'):
line = line.strip().split('\t')
seq = line.pop(0)
taxList = [x.split('{')[1] for x in line[1:]]
taxList = [x.replace('}','') for x in taxList]
tax = ';'.join(taxList)
print('%(seq)s\t%(tax)s;' % {'seq':seq,'tax':tax})