Make.biom picrust option doesnt handle unclassified?

adamc83 · February 19, 2014, 9:41pm

I’m looking at mothur’s new picrust option to make.biom (very cool), but quickly ran into an issue – it doesnt seem to handle sequences that are unclassified at a certain tax level?

[ERROR]: k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;unclassified;unclassified;unclassified; is not in taxonomy tree, please correct.

Short version of how I got there – full mothur pipeline, including building OTUs, then generate the shared file, classify OTUs with greengenes, and run

make.biom(shared=test.shared, label=0.03, constaxonomy=test.0.03.cons.taxonomy, reftaxonomy=/data/mothur_classification_refs/gg_13_5_99.gg.tax, picrust=/data/mothur_classification_refs/97_otu_map.txt)

Of course the above taxonomy item doesnt exist in gg_13_5_99.gg.tax – there are no lines that use the term unclassified. And “k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;” is ambiguous – it matches 4665 OTUs in the tax file.

Is this a bug or just a limitation?

danieln · February 20, 2014, 10:03am

I’m having the exact same problem here with: Bacteria(100);“Bacteroidetes”(100);“Bacteroidia”(100);"Bacteroidales(100);Bacteroidaceae(100);Bacteroides(100)

Manu · February 20, 2014, 12:27pm

I’m having the same problem with my data and I don`t know how to solve.

Manu · February 21, 2014, 4:05pm

I cutt off all unclassified names from -cons.taxonomy and the command make.biom works!

danieln · February 22, 2014, 9:22am

Same old problem. Even with full bacteroides taxonomy (the problem I currently have), suspect might be a bug.

westcott · February 24, 2014, 7:28pm

Thanks for reporting this. I will work on a fix right away. Could one of you send me your input files so I can confirm the fix? mothur.bugs@gmail.com.

adamc83 · February 24, 2014, 9:22pm

Sent files.

westcott · February 25, 2014, 5:58pm

Thanks for sending your files. Version 1.33.1 is up on the wiki for download, http://www.mothur.org/wiki/Download_mothur.

danieln · February 26, 2014, 6:16am

Hi Sarah,

Now I think I have problem with your make.biom example data:

mothur > make.biom(shared=final.tx.1.subsample.1.pick.shared,label=1,reftaxonomy=gg_13_5_99.gg.tax,constaxonomy=final.tx.1.cons.taxonomy,picrust=97_otu_map.txt)

1
[ERROR]: Bacteria;“Bacteroidetes”;“Bacteroidia”;“Bacteroidales”;“Porphyromonadaceae”; is not in taxonomy tree, please correct.

A common error message I suspect I have with my experimental data.

Daniel

westcott · February 26, 2014, 4:30pm

It doesn’t look like the taxonomy file that was used to create the *.cons.taxonomy file was classified using the green genes reference files. The green genes reference taxonomies all look like:

k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__;g__;s__;

Could you have run classify.seqs with the silva reference files?

westcott · February 26, 2014, 4:38pm

You can run something like:

list.seqs(fasta=final.fasta) -list sequences in final fasta
get.seqs(fasta=originalFastaGivenToClassifySeqs, accnos=current) - select final sequences from original fasta file to classify

from the SOP, something like:
#classify.seqs(fasta=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.pick.fasta, name=GQY1XT001.shhh.trim.unique.good.filter.unique.precluster.pick.names, group=GQY1XT001.shhh.good.pick.groups, template=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80, processors=2)

classify.seqs(fasta=current, taxonomy=gg_13_5_99.gg.tax, reference=gg_13_5_99.fasta) - reclassify to greengenes
classify.otu(taxonomy=current, name=final.names, list=final.an.list) - find consensus taxonomy
make.biom(shared=final.an.shared, constaxonomy=final.an.0.03.cons.taxonomy, reftaxonomy=gg_13_5_99.gg.tax, picrust=97_otu_map.txt) - make biom file for picrust

danieln · February 27, 2014, 6:51am

Thank you Sarah, for pointing out my silly mistake

g.minard · July 1, 2014, 1:04pm

Hi,

I was trying to format a BIOM file for picrust and get a similar problem with the new version : mothur v.1.33.3

I ran :

classify.seqs(fasta=final.fasta, count=final.count, taxonomy=gg_13_5_99.gg.tax, reference=gg_13_5_99.fasta, processors=8)
classify.otu(taxonomy=final.gg.wang.taxonomy, count=final.count, list=final.list, label=0.03)
make.biom(shared=final.shared, label=0.03, reftaxonomy=gg_13_5_99.gg.tax, constaxonomy=final.0.03.cons.taxonomy, picrust=99_otu_map.txt)

and I get the following error :

[ERROR]: could not find OTUId for unknown(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);unclassified(100);. Its reference sequences are unknown .

Is their still a problem with “unclassified” or did I make a mistake ?

Thanks

Guillaume

pschloss · July 1, 2014, 6:04pm

The problem is probably the “unknown” part. After running classify.seqs you should run remove.lineage and include unknown as one of your taxa.

pat

g.minard · August 4, 2014, 2:57pm

Thank you, Patrick, for your answer. This solved the problem.

Guillaume

djukovic_ana · September 2, 2014, 3:43pm

Hello,
I think I was doing it as you said, but I still have some problems. My command:
classify.seqs(fasta=total.final.fasta,template=99_otus.fasta,taxonomy=99_otu_taxonomy,name=total.final.names,group=total.final.groups,cutoff=60,processors=2)
And errors:
[ERROR]: g__; is missing the final ‘;’, ignoring.
[ERROR]: p__Firmicutes; is already in your taxonomy file, names must be unique.
and then
‘4484376’ is in your template file and is not in your taxonomy file. Please correct.
I checked and it is in my taxonomy file, but it seems like it doesnÂ´t recognize it.
I saw that sebastianfangliu already had this problems, but as I understood, he created his own tax file and fasta file, and I am using greengene database, so I donÂ´t think there could be errors like whitespaces etc.

What am I doing wrong?
Thank you,
Ana

pschloss · September 10, 2014, 5:48pm

I suspect you have spaces in your taxonomy file that are throwing things off. Have you tried using our greengenes reference taxonomy?

http://www.mothur.org/wiki/Greengenes-formatted_databases

djukovic_ana · September 11, 2014, 8:18am

Hello,

there were spaces. But it surprises me since I downloaded the database from mothur website. The important is that it works now!

Thank you very much!
Ana

pschloss · September 11, 2014, 12:19pm

Ana,

pretty sure you didn’t download those from the mothur website as those file names are a bit different from ours. glad it’s working now!

Pat

Rich · September 11, 2014, 2:59pm

Hello Mothur Group,

I have converted my file to biom format for picrust with following command (using version mothur1.33.3)
mothur “#make.biom(shared=Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.shared, label=0.03, reftaxonomy=gg_13_8_99.gg.tax, constaxonomy=Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.cons.taxonomy, picrust=97_otu_map.txt)”

I got two output files
Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.biom_shared and Final.shhh.trim.unique.good.filter.unique.precluster.pick.pick.subsample.an.0.03.biom

On giving the 2nd file as an input for picrust i am getting an error and there are “unclassified” in this file.
{“id”:“100001”, “metadata”:{“taxonomy”:[“k__Bacteria”, “unclassified”, “unclassified”, “unclassified”, “unclassified”, “unclassified”, “unclassified”]}}
Please help in debugging the error.

Best,
Rich

Topic		Replies	Views
make.biom with picrust parameter Commands in mothur	3	1135	February 16, 2017
Make.biom picrust option error mothur bugs	3	1824	December 25, 2016
classify.seqs with greenegenes tax and fasta gives 100% unclassified Commands in mothur	1	1151	November 6, 2015
Preparing files for picrust with mothur	6	1137	January 26, 2023
Problems using make.biom for PICRUSt Commands in mothur	9	2514	February 22, 2019

Make.biom picrust option doesnt handle unclassified?

Related topics