OTUs : *_unclassified etc.


  1. After following all the steps in the SOP and looking at my output, there are several otus labelled as ‘unclassified’ at the genus level(e.g. ‘Clostridiales_unclassified’). There is more than one OTU labelled as ‘Clostridiales_unclassified’. How should these be dealt with?

  2. Also get otus like ‘Ruminococcaceae(98)’. What does the 98 mean? Esp. as there is another category ‘Ruminococcaceae’ also present.

Is there any document/FAQ that I could read that would help me?

  1. I don’t think you really need to ‘deal’ with this as a problem. The fact that they’re separate OTUs means that, for your definition of an OTU, they’re two separate lineages. The fact that they can’t be accurately classified the order level is just a reflection of that fact that those lineages haven’t been observed before (or if they have, they haven’t been incorporated into SILVA).

  2. The 98% is the bootstrap support for this assignment. Classify.seqs uses this method to classify sequences and assess the robustness of the assignment.

After rarefaction, I keep OTUs above a certain number of reads, and that are present in more than 5% of samples. I seem to be getting a lot of ‘unclassifieds’ and many OTUs seem to be the same. Does this look normal? I have put my otu taxonomy table at:


Both are to be expected. Firstly - the unclassified is not a big deal, we know there’s a large amount of microbial diversity that’s not captured in sequence databases. So it’s impossible for a database to accurately classify every sequence in a sample.

It’s also not surprising that you have multiple OTUs in the same genus. If you see sequences that are genetically distinct (OTUs) falling into the same genus, they likely just represent different species within that genus. Or are different enough from the database to not be classified either way.

