I’m looking at mothur’s new picrust option to make.biom (very cool), but quickly ran into an issue – it doesnt seem to handle sequences that are unclassified at a certain tax level?
[ERROR]: k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;unclassified;unclassified;unclassified; is not in taxonomy tree, please correct.
Short version of how I got there – full mothur pipeline, including building OTUs, then generate the shared file, classify OTUs with greengenes, and run
Of course the above taxonomy item doesnt exist in gg_13_5_99.gg.tax – there are no lines that use the term unclassified. And “k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;” is ambiguous – it matches 4665 OTUs in the tax file.
I’m having the exact same problem here with: Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);"Bacteroidales(100);Bacteroidaceae(100);Bacteroides(100)
It doesn’t look like the taxonomy file that was used to create the *.cons.taxonomy file was classified using the green genes reference files. The green genes reference taxonomies all look like:
list.seqs(fasta=final.fasta) -list sequences in final fasta
get.seqs(fasta=originalFastaGivenToClassifySeqs, accnos=current) - select final sequences from original fasta file to classify
from the SOP, something like: #classify.seqs(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.pick.fasta, name=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.pick.names, group=GQY1XT001.shhh.good.pick.groups, template=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80, processors=2)
classify.seqs(fasta=current, taxonomy=gg_13_5_99.gg.tax, reference=gg_13_5_99.fasta) - reclassify to greengenes
classify.otu(taxonomy=current, name=final.names, list=final.an.list) - find consensus taxonomy
make.biom(shared=final.an.shared, constaxonomy=final.an.0.03.cons.taxonomy, reftaxonomy=gg_13_5_99.gg.tax, picrust=97_otu_map.txt) - make biom file for picrust
[ERROR]: could not find OTUId for unknown(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);. Its reference sequences are unknown .
Is their still a problem with “unclassified” or did I make a mistake ?
Hello,
I think I was doing it as you said, but I still have some problems. My command:
classify.seqs(fasta=total.final.fasta,template=99_otus.fasta,taxonomy=99_otu_taxonomy,name=total.final.names,group=total.final.groups,cutoff=60,processors=2)
And errors:
[ERROR]: g__; is missing the final ‘;’, ignoring.
[ERROR]: p__Firmicutes; is already in your taxonomy file, names must be unique.
and then
‘4484376’ is in your template file and is not in your taxonomy file. Please correct.
I checked and it is in my taxonomy file, but it seems like it doesn´t recognize it.
I saw that sebastianfangliu already had this problems, but as I understood, he created his own tax file and fasta file, and I am using greengene database, so I don´t think there could be errors like whitespaces etc.
I have converted my file to biom format for picrust with following command (using version mothur1.33.3)
mothur “#make.biom(shared=Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.shared, label=0.03, reftaxonomy=gg_13_8_99.gg.tax, constaxonomy=Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.cons.taxonomy, picrust=97_otu_map.txt)”
I got two output files
Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.biom_shared and Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.biom
On giving the 2nd file as an input for picrust i am getting an error and there are “unclassified” in this file.
{“id”:“100001”, “metadata”:{“taxonomy”:[“k__Bacteria”, “unclassified”, “unclassified”, “unclassified”, “unclassified”, “unclassified”, “unclassified”]}}
Please help in debugging the error.