Clustering issue

evensr · May 26, 2016, 2:16pm

Dear all,

I am following the MiSeq SOP for my intestinal microbiome dataset.

I have 146,035 representative, classified sequences, which cluster into 70,411 OTUs when using the “cluster.split” command (.dist, .count_table, .taxonomy, taxlevel=4, cutoff=0.2, method=average) at 97% similarity level. Of these, 12,279 are classified (classify.otu) as a single genus of interest (let’s call it genus X). 9828 of these are singletons. I find the numbers a bit strange here, so I wanted to try another strategy:

From the 146,035 representative sequences I “pulled out” 14,499 sequences classified as genus X (by classify.seqs), used the respective .fasta sequences for dist.seqs, and then clustered them using the “cluster” (not “cluster.split”) command (.dist, .name, method=average). (The .name file I used here contained only one sequence name in the right column per line (the sequences only represent themselves)).

In this way, only the sequences already classified as genus X was used for clustering. I then got 152 OTUs at the 97% similarity level. (997 at 98%, 9538 at 99% etc.). This makes much more sense to me. I can’t really believe there exists 12,279 OTUs for genus X, 152 sounds much more realistic. Nevertheless, I do not understand why this is. Does anyone have a good explanation or at least some thoughts on how this can be?

All comments are welcome!

Even

PS: Additional fact (don’t know if it’s important): One of my OTUs contain half of the total amount of sequences (6 mill). This OTU belongs to the same order as genus X. I split at the Order level in cluster.split, so it may be relevant…

pschloss · May 28, 2016, 10:54am

I suspect that the problem is that you are clustering with a names file that doesn’t accurately represent the redundancy in the data. The average neighbor algorithm includes that information in making the clusters. So removing it would probably make your sequence clusters seem more similar than they really are.

Pat

Topic		Replies	Views
Cluster.split Commands in mothur	1	1920	December 20, 2014
Result after classifying.otu Commands in mothur	2	1109	August 29, 2016
Clustering OTUs Commands in mothur	5	1426	March 1, 2017
A problem in clustering Commands in mothur	2	2366	October 2, 2012
Many OTUs classify into the same species?? Commands in mothur	2	1828	February 9, 2016

Clustering issue

Related topics