Bacteria_unclassified

kimitas · February 10, 2017, 8:01pm

Hi,
I have a very high abundance of ‘Bacteria_unclassified’ in my taxonomy file: Bacteria(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified_unclassified(100);

I classified with the silva.nr_123 database.

I was wondering if this was normal? or if someone has come over this before?..or if it is my data that is crap…?
I guess we can remove them with the remove.lineage( fasta=X, count=X, taxonomy=X, taxon=Bacteria_unclassified) ?

Thanks!

pschloss · February 13, 2017, 1:02pm

What percentage? How long are your sequences? What environment?

McCoyk · February 13, 2017, 6:50pm

I noticed this too. I didn’t get alot per se, but when I made a heatmap with my Otu data and limited the output to the 75 most abundant, Bacteria_unclassified was listed several times, which got my attention. In my stabilityps1.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.0.03.cons.taxonomy file, there are 103864 total entries. There are 553 entries counting just ones that have Bacteria(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified(100);Bacteria_unclassified(100); associated. I found which Otus had the most sequences in them, grabbed the representative seq that represented the most and blasted it. The two top hits were uncultured, unidentified environmental samples in both cases. The results made sense based on my samples.
My sequences are of the V3 and V4 region, unaligned they are 464nt, aligned to the Silva_v128 database (I used this for classification as well), and subsequently filtered, they are 846nt. They were run on a MiSeq using the 16S amplicon protocol.

campenr · February 13, 2017, 7:31pm

I see this quite a bit in my samples which are mostly marine sediments from locations that have only been recently sampled for the fist time.

As the classifications are only as good as the classification db, it’s entirely possible that these bacteria_unclassified are just that, bacteria for which there is no closely related classified hit.

The important question is where are your samples from, and is the presence of these unclassified Bacteria reasonable? If it were from the human gut I would think not, but if it’s from a high diversity environmental sample, then perhaps.

Cheers
Richard

kimitas · February 14, 2017, 2:08am

Hi Thanks !
Yes they represent about 2 to 30% in my samples which are Marine biofilms form different environment (16s V3-V4 ~MiDSeq 2x250)
I guess it could be then that they are not classified yet!..
Cheers

campenr · February 14, 2017, 2:45pm

One more thing is that classifying the OTUs in mothur is not the end of the process. I would BLAST the representative sequences from these Bacteria_unclassified OTUs. You may well find you get good hits to uncultured clones from environments similar to the one you sampled, which would support the idea that they are valid community members, just not well classified.

Cheers
Richard

chrismec · November 12, 2021, 11:25am

Hey guys,

I don’t know if this is still a question of interest, but while working with my environmental samples using MiSeq Illumina seq (single-end mode) and the Qiime2 and Mothur pipelines I was running into many issues and questions. One was the fraction of unclassified bacteria which was first <30% per sample. I tried different alignment approaches (for instance direct Sina alignment with silva as reference database) and could reduce this value to 15-23% per sample. However, after doing some research and personal communications, the issue lies in the bootstrap values of the classifier, here I was working with a cut-off of 80%. In the Silva manual somewhere, it says a bootstrap of 40% should also be acceptable. With this value, I reduced the unclassified bacteria to <5% per sample. This could be helpful for the statistical analysis with RPCA or so because you have better taxonomic assignments, but be careful, playing with lower bootstrap values also increases the probability of wrong assignments (‘likelihood of the tree’)!
regards

Alexandre_Thibodeau · November 22, 2021, 7:22pm

Hello!

For the classifisation, I recommend using a positive control of some sort to see how the different databases react to your sequences. For example, based on Zymo DNA, (I do classic gut bacteria analysis), when I use PDS I must set the cut-off at 70% to get a correct classification of sequences because if I run at 75% I am loosing, for example, my Salmonella taxonomic assignment in my positive control. But for Silva, a 75% cut-off is ok and I am running a classification as we speak at 80% to see if my positive control still make sense. I do 1000 iters (I am running on Compute Canada servers).

Controls are the key, both negative and positive.

Best of successes

Alexandre_Thibodeau · November 23, 2021, 1:43pm

Hello! I just finished my classification with PDS18 and Silva138. With a cutoff of 70, PDS18 is still not able to correctly classify my positive control (I swear it did with PDS16) while Silva138 is fine using a cutoff of 80.

Alexandre_Thibodeau · November 24, 2021, 2:11pm

Just to confirm that there is a problem with PDS18 for classification. Even with a cutoff of 65%, it still do not assign correctly my positive control. If you want to use PDS, use trainset16.
And controls…

Cheers!

pschloss · November 30, 2021, 6:13pm

I’d strongly discourage using a threshold under 80%. By dipping down to lower levels you are admitting that you have less confidence in the data, which never seemed like a good idea to me. As Alexandre mentioned, you can try other datbases and see if the classification improves.

Pat

Topic		Replies	Views
unclassified sequences? Commands in mothur	3	1612	March 21, 2017
what is the possible reason causing high abundance as "bacteria;unclassified" in OTUs table? Theory behind mothur	2	2530	August 21, 2015
Too many unclassified Commands in mothur	4	5397	December 4, 2011
Classifying Sequences Commands in mothur	5	4279	March 21, 2012
OTUs : *_unclassified etc. Commands in mothur	4	863	June 2, 2017

Bacteria_unclassified

Related topics