Why use Consensus sequences

mainly.microbe · May 9, 2025, 6:49pm

Hi All, I am new to this and I am confused about why consensus sequences are helpful rather than just the opti_mcc.shared file. It does tell me more information and is more helpful but I am doing a project where I want to identify which microbial groups are most abundant in an estuary and I am nervouse the consensus sequences are not truly representative?

I guess a better question is, is the opti_mcc_cons.taxonomy file better to use for downstream analysis or the other file? I am confused about which I should use and what information its telling me/why its important?

pschloss · May 12, 2025, 1:28pm

Hi there,

I recommend the cons.taxonomy file because it presents the consensus taxonomy for all of the sequences in an OTU. It is not a consensus sequence - it’s a consensus taxonomy. What the function does is to take the classifications of each sequence in an OTU and then report the 51% majority taxonomy of the sequences in the OTU.

The alternative approach is to pick a single sequence from an OTU and then use its taxonomy as the taxonomy of the OTU. This can be problematic if you get lucky/unlucky in which sequence is selected. Something might get you a deeper or less deep classification.

By taking the consensus taxonomy for all sequences, you effectively let the sequences vote on the classification for the OTU.

This file is used for downstream analyses when you want to know what type of bacteria are represented by OTUs that are differentially abundant between treatment groups.

Hope this clarifies things…
Pat

mainly.microbe · May 13, 2025, 2:55am

That makes sense thank you!
Now that we have taxonomy names and i’ve ranked my OTUs from most to least abundant, I want to learn everything there is to know about my top10-20 most abundant organisms.

My advisor told me that BLASTing and googling things like gammaproteobacteria to understand my many gammaproteobacteria_unclassified sequences is the best way to learn about all the microbes in my samples.

Is the best strategy to just google the names generated by the cons.taxonomy file and go from there? Considering what you’ve just said, this doesn’t seem like the most intuitive way to learn about my most abundant sequences.

Any advice would be so helpful,im a little lost

pschloss · May 14, 2025, 12:36pm

Certainly! I would take the blast results with a grain of salt. Those taxonomies aren’t always very robust. Also, similar to the problem with taking one sequence and classifying it rather than using all of the sequences in the OTU, blast will give you a false impression of the specificty of the taxonomy. Blast is also notorious for providing only a local and not global alignment and then sorting the results by the best local alignment, effectively burying the best global alignment.

Pat

system · May 24, 2025, 12:37pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OTU classification Feature requests	4	6420	October 14, 2013
Sequence file lost!	5	26	April 21, 2025
Classify.otu results in many sequences not in the taxonomy file Commands in mothur	3	1073	September 20, 2018
OTU classification in taxonomy file and RDP classification of rep sequence don't agree Theory behind mothur	8	2829	July 30, 2017
Opti_mcc file confusion Theory behind mothur	4	21	May 30, 2025

Why use Consensus sequences

Related topics