cluster.seqs and OTU assignment

Amblyomma · January 10, 2011, 8:42pm

I’m processing 454 16S bacterial data following the Costello example. Currently I am only working with the data from a single MID while developing my pipeline, so by the time I reach the cluster.seqs step I have 1532 sequences total (~371 unique). Before all of the filtering and trimming from the Costello pipeline I have >2000 sequences.

After running through the classify commands to assign OTUs, I have one llarge OTU (>1400 sequences) and 22-11 others (at the 0.03 and 0.10 OTU definition levels). The one large OTU is not unexpected; I already knew this sample was dominated by a single species. What concerns me is that the remaining OTUs don’t appear well defined to me. My understanding is that at the 0.03 level, sequences that are at least 3% identical should be assigned to the same OTU. However, I have several OTUs containing only one sequence, and I can align these sequences on the Blast website and have <1% difference. Why aren’t they in a single OTU? Is this just reflecting a difference between the alignment algorithms? The vast majority of the sequences in these small OTUs blast as the same genus, but simply raising my cutoff definition doesn’t seem to fix it (or rather, raising the cutoff to 0.20 seems a little ridiculous when there doesn’t appear to be that much difference between the sequences).

Thanks in advance!

pschloss · January 10, 2011, 9:34pm

Hmm… That sounds weird. So a couple things…

BLAST is a local alignment algorithm, which means that it only reports a % similarity for the most conserved portion of the sequence. So it could align 100 of 200 bases and they could be 100% identical over that 100 bases, but very different over the other 100 bases. In contrast, mothur uses a global approach and calculates the distance over the full length of the gene.
I’m not sure what #2 could be - can you post two of the singleton OTUs that you think should be in the same OTU and we can take a look? Feel free to post the two sequences here or to email them to mothur.bugs@gmail.com.

Amblyomma · January 11, 2011, 3:54pm

Emailed in the sequences and my batch file.

Topic		Replies	Views
Result after classifying.otu Commands in mothur	2	1109	August 29, 2016
any idea why all sequences are shown in a separate OTU? Commands in mothur	2	1858	September 3, 2014
Cluster.split Commands in mothur	1	1920	December 20, 2014
Problem with OTU classification mothur bugs	5	5590	April 19, 2010
Representative OTU seqs closer than OTU cut-off? Theory behind mothur	3	5744	April 13, 2011

cluster.seqs and OTU assignment

Related topics