Why use Consensus sequences

Hi All, I am new to this and I am confused about why consensus sequences are helpful rather than just the opti_mcc.shared file. It does tell me more information and is more helpful but I am doing a project where I want to identify which microbial groups are most abundant in an estuary and I am nervouse the consensus sequences are not truly representative?

I guess a better question is, is the opti_mcc_cons.taxonomy file better to use for downstream analysis or the other file? I am confused about which I should use and what information its telling me/why its important?

Hi there,

I recommend the cons.taxonomy file because it presents the consensus taxonomy for all of the sequences in an OTU. It is not a consensus sequence - it’s a consensus taxonomy. What the function does is to take the classifications of each sequence in an OTU and then report the 51% majority taxonomy of the sequences in the OTU.

The alternative approach is to pick a single sequence from an OTU and then use its taxonomy as the taxonomy of the OTU. This can be problematic if you get lucky/unlucky in which sequence is selected. Something might get you a deeper or less deep classification.

By taking the consensus taxonomy for all sequences, you effectively let the sequences vote on the classification for the OTU.

This file is used for downstream analyses when you want to know what type of bacteria are represented by OTUs that are differentially abundant between treatment groups.

Hope this clarifies things…
Pat

That makes sense thank you!
Now that we have taxonomy names and i’ve ranked my OTUs from most to least abundant, I want to learn everything there is to know about my top10-20 most abundant organisms.

My advisor told me that BLASTing and googling things like gammaproteobacteria to understand my many gammaproteobacteria_unclassified sequences is the best way to learn about all the microbes in my samples.

Is the best strategy to just google the names generated by the cons.taxonomy file and go from there? Considering what you’ve just said, this doesn’t seem like the most intuitive way to learn about my most abundant sequences.

Any advice would be so helpful,im a little lost

Certainly! I would take the blast results with a grain of salt. Those taxonomies aren’t always very robust. Also, similar to the problem with taking one sequence and classifying it rather than using all of the sequences in the OTU, blast will give you a false impression of the specificty of the taxonomy. Blast is also notorious for providing only a local and not global alignment and then sorting the results by the best local alignment, effectively burying the best global alignment.

Pat