Make.biom picrust option doesnt handle unclassified?

I’m looking at mothur’s new picrust option to make.biom (very cool), but quickly ran into an issue – it doesnt seem to handle sequences that are unclassified at a certain tax level?

[ERROR]: k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;unclassified;unclassified;unclassified; is not in taxonomy tree, please correct.

Short version of how I got there – full mothur pipeline, including building OTUs, then generate the shared file, classify OTUs with greengenes, and run

make.biom(shared=test.shared, label=0.03, constaxonomy=test.0.03.cons.taxonomy, reftaxonomy=/data/mothur_classification_refs/gg_13_5_99.gg.tax, picrust=/data/mothur_classification_refs/97_otu_map.txt)

Of course the above taxonomy item doesnt exist in gg_13_5_99.gg.tax – there are no lines that use the term unclassified. And “k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;” is ambiguous – it matches 4665 OTUs in the tax file.

Is this a bug or just a limitation?

I’m having the exact same problem here with: Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);"Bacteroidales(100);Bacteroidaceae(100);Bacteroides(100)

I’m having the same problem with my data and I don`t know how to solve.

I cutt off all unclassified names from -cons.taxonomy and the command make.biom works!

Same old problem. Even with full bacteroides taxonomy (the problem I currently have), suspect might be a bug.

Thanks for reporting this. I will work on a fix right away. Could one of you send me your input files so I can confirm the fix? mothur.bugs@gmail.com.

Sent files.

Thanks for sending your files. Version 1.33.1 is up on the wiki for download, http://www.mothur.org/wiki/Download_mothur.

Hi Sarah,

Now I think I have problem with your make.biom example data:

mothur > make.biom(shared=final.tx.1.subsample.1.pick.shared,label=1,reftaxonomy=gg_13_5_99.gg.tax,constaxonomy=final.tx.1.cons.taxonomy,picrust=97_otu_map.txt)

1
[ERROR]: Bacteria;“Bacteroidetes”;“Bacteroidia”;“Bacteroidales”;“Porphyromonadaceae”; is not in taxonomy tree, please correct.

A common error message I suspect I have with my experimental data.

Daniel

It doesn’t look like the taxonomy file that was used to create the *.cons.taxonomy file was classified using the green genes reference files. The green genes reference taxonomies all look like:

k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__;g__;s__;

Could you have run classify.seqs with the silva reference files?

You can run something like:

list.seqs(fasta=final.fasta) -list sequences in final fasta
get.seqs(fasta=originalFastaGivenToClassifySeqs, accnos=current) - select final sequences from original fasta file to classify

from the SOP, something like:
#classify.seqs(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.pick.fasta, name=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.pick.names, group=GQY1XT001.shhh.good.pick.groups, template=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80, processors=2)

classify.seqs(fasta=current, taxonomy=gg_13_5_99.gg.tax, reference=gg_13_5_99.fasta) - reclassify to greengenes
classify.otu(taxonomy=current, name=final.names, list=final.an.list) - find consensus taxonomy
make.biom(shared=final.an.shared, constaxonomy=final.an.0.03.cons.taxonomy, reftaxonomy=gg_13_5_99.gg.tax, picrust=97_otu_map.txt) - make biom file for picrust

Thank you Sarah, for pointing out my silly mistake :smiley:

Hi,

I was trying to format a BIOM file for picrust and get a similar problem with the new version : mothur v.1.33.3

I ran :

classify.seqs(fasta=final.fasta, count=final.count, taxonomy=gg_13_5_99.gg.tax, reference=gg_13_5_99.fasta, processors=8)
classify.otu(taxonomy=final.gg.wang.taxonomy, count=final.count, list=final.list, label=0.03)
make.biom(shared=final.shared, label=0.03, reftaxonomy=gg_13_5_99.gg.tax, constaxonomy=final.0.03.cons.taxonomy, picrust=99_otu_map.txt)

and I get the following error :

[ERROR]: could not find OTUId for unknown(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);. Its reference sequences are unknown .

Is their still a problem with “unclassified” or did I make a mistake ?

Thanks

Guillaume

The problem is probably the “unknown” part. After running classify.seqs you should run remove.lineage and include unknown as one of your taxa.

pat

Thank you, Patrick, for your answer. This solved the problem.

Guillaume

Hello,
I think I was doing it as you said, but I still have some problems. My command:
classify.seqs(fasta=total.final.fasta,template=99_otus.fasta,taxonomy=99_otu_taxonomy,name=total.final.names,group=total.final.groups,cutoff=60,processors=2)
And errors:
[ERROR]: g__; is missing the final ‘;’, ignoring.
[ERROR]: p__Firmicutes; is already in your taxonomy file, names must be unique.
and then
‘4484376’ is in your template file and is not in your taxonomy file. Please correct.
I checked and it is in my taxonomy file, but it seems like it doesn´t recognize it.
I saw that sebastianfangliu already had this problems, but as I understood, he created his own tax file and fasta file, and I am using greengene database, so I don´t think there could be errors like whitespaces etc.

What am I doing wrong?
Thank you,
Ana

I suspect you have spaces in your taxonomy file that are throwing things off. Have you tried using our greengenes reference taxonomy?

http://www.mothur.org/wiki/Greengenes-formatted_databases

Hello,

there were spaces. But it surprises me since I downloaded the database from mothur website. The important is that it works now!

Thank you very much!
Ana

Ana,

pretty sure you didn’t download those from the mothur website as those file names are a bit different from ours. glad it’s working now!

Pat

Hello Mothur Group,

I have converted my file to biom format for picrust with following command (using version mothur1.33.3)
mothur “#make.biom(shared=Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.shared, label=0.03, reftaxonomy=gg_13_8_99.gg.tax, constaxonomy=Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.cons.taxonomy, picrust=97_otu_map.txt)”

I got two output files
Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.biom_shared and Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.biom

On giving the 2nd file as an input for picrust i am getting an error and there are “unclassified” in this file.
{“id”:“100001”, “metadata”:{“taxonomy”:[“k__Bacteria”, “unclassified”, “unclassified”, “unclassified”, “unclassified”, “unclassified”, “unclassified”]}}
Please help in debugging the error.

Best,
Rich