final taxonomy file - many OTUs with zero representation in samples???

I’ve finished analyzing a data set following the MiSEQ protocol. After classifying OTU’s, my resulting taxonomy file has many OTUs listed as unclassified, with a zero value in the column for size. Is this sort of thing normal?

Here’s a sample:

OTU Size Taxonomy
Otu000001 85288 Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Streptococcaceae(100);Streptococcus(100);
Otu000002 24659 Bacteria(100);Fusobacteria(100);Fusobacteriia(100);Fusobacteriales(100);Leptotrichiaceae(100);Sneathia(100);
Otu000003 17555 Bacteria(100);Actinobacteria(100);Actinobacteria(100);Micrococcales(100);Micrococcaceae(100);Rothia(100);
Otu000004 17096 Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Streptococcaceae(100);Streptococcus(100);
Otu000005 13940 Bacteria(100);Firmicutes(100);Bacilli(100);Lactobacillales(100);Streptococcaceae(100);Streptococcus(100);
Otu000006 0 unclassified(100);
Otu000007 0 unclassified(100);
Otu000008 9423 Bacteria(100);Tenericutes(100);Mollicutes(100);Mycoplasmatales(100);Mycoplasmataceae(100);Ureaplasma(100);
Otu000009 0 unclassified(100);
Otu000010 0 unclassified(100);
Otu000011 0 unclassified(100);
Otu000012 8255 Bacteria(100);Firmicutes(100);Negativicutes(100);Selenomonadales(100);Veillonellaceae(100);Veillonella(100);
Otu000013 7820 Bacteria(100);Fusobacteria(100);Fusobacteriia(100);Fusobacteriales(100);Fusobacteriaceae(100);Fusobacterium(100);
Otu000014 0 unclassified(100);
Otu000015 0 unclassified(100);
Otu000016 0 unclassified(100);

Hmmm… that seems odd. Could you send your files to mothur.bugs@gmail.com?

Can do - which files in particular, or just all of them?

The log file with the commands you ran, and the inputs to the classify.otu command would be great. Thanks, :slight_smile:

I have the same issue. All the unclassified OTUs at first level have zero representation. I also was expecting to get much less OTUs given that the unique seqs are ~68,000 and OTUs are >500,000. Of these listed OTUs, only 10621 have some representation, which sum up to my total number of seqs. Basically all the unclassified OTUs listed are not real, so I’m not sure why they were listed in a first instance.

Any ideas?

Thanks

This is likely a file mismatch problem. If you send your input files to mothur.bugs@gmail.com, I can take a look for you?

I’m not sure if you all worked out what your problem was. I had the same issue of lots of zero OTUs in the output of my classify.otu command and found out that it was because chimeras were still in my count file…

If you don’t use reference as self (i.e. using count_table in the chimera.uchime step) and instead use reference=silva.gold or something else then you need to update your count table with remove.seqs after that step.

Doing that fixed the downstream problem for me…
Hope that helps

This is a file mismatch error…

When using a reference with chimera.uchime, you need to remove the chimeras from the count table or name file using remove.seqs. Failure to do so creates a file mismatch. Summary.seqs and a few other commands, namely classify.otu are “forgiving” intentionally. The classify.otu issue stems from the list file. Not removing the chimeras from the count or name file allows them to be added to the list file because mothur creates the “unique” list from the names in the count table or names file.

We will add an error message / warning in an upcoming release.