Pre.cluster command

FenrirReverie · September 4, 2013, 6:58pm

Hi I just wanted to confirm this with someone that my understanding of the pre.cluster step is correct:

Performing the pre.clustering step doesn’t actually remove or change any sequence, it just “preclusters” them with a unique sequence, so the subsequent processing “considers” it as 100% similar to the “merged” unique sequence, but if you go back to look at the DNA sequence in the fasta file, the sequences are still different.

The reason I ask is that after looking at the DNA sequences of some sequences from the same OTU (~100 bp, called at 97%), I see some sequences that have 4 - 5 mismatches. I think this is mostly likely due to how the pre.clustering step was carried out, but I just wanted to confirm that my thinking and understanding of the step is correct.

FYI, I set the diffs = 1 for pre.cluster.

Thank you so much!

pschloss · September 4, 2013, 8:07pm

If you are using diffs=1, then the furthest any two sequences should be from each other within a cluster would be 2 bp. The clusters resulting after pre.cluster are not the same as OTUs and so it would be reasonable to expect more variation within an OTU. Recall that the default clustering method is the average neighbor algorithm, which requires the OTUs to be on average at most 3% different from each other. So it is reasonable for there to be <5 differences between sequences, although those would be expected to be rare.

FenrirReverie · September 4, 2013, 9:56pm

Got it! Thanks so much Pat!

Topic		Replies	Views
pre.cluster command Theory behind mothur	1	3934	August 28, 2014
Pre.cluster Commands in mothur	3	3149	July 30, 2012
Pre.cluster diffs option Commands in mothur	1	1346	June 16, 2015
preclustering Commands in mothur	1	2280	June 27, 2013
Removal of sequences in pre.cluster Commands in mothur	1	2927	June 7, 2011

Pre.cluster command

Related topics