pre.cluster command

svazquez · August 27, 2014, 8:49am

Hi
I am concerned about sth I think I don’t understand properly.
In the 454SOP there is this comment: Final step to reduce the sequencing error: use the pre.cluster command to merge sequence counts that are within 2 bp of a more abundant sequence. This implementation of the command will split the sequences by group and then within each group it will pre-cluster those sequences that are with 1 or 2 bases of a more abundant sequence. It will then merge all the sequences back into one fasta file and a names file. As a rule of thumb we use a difference of 1 bp per 100 bp of sequence length.

This means that if I work with FLX+ and have sequence lengths around 500 bp I should select diffs=5 instead of diffs=2 as in the SOP?

And then, this means that two or more sequences that were 1-2 bp apart will continue being two sequences for the counts but only one sequence will represent the group in the unique sequences dataset? Is the same idea behind OTU deffinition but instead of making clusters of sequences that share (let’s say) 97% similarity here the clusters are made of sequences that are no more than 2 bp different?

Sorry, I am puzzled with this

Thank you. Susana

pschloss · August 28, 2014, 6:03pm

This means that if I work with FLX+ and have sequence lengths around 500 bp I should select diffs=5 instead of diffs=2 as in the SOP?

Yup, although you can always use a smaller number. If you diffs=5 and you have two sequences that are 2 nt apart, then they will get merged.

Pat

Topic		Replies	Views
Pre.cluster command Commands in mothur	2	2175	September 4, 2013
Pre.cluster Commands in mothur	3	3163	July 30, 2012
Pre.cluster diff number setting based on sequence lenght Commands in mothur	3	782	March 9, 2020
Problems with pre.cluster / diff=5 mothur bugs	1	2064	June 1, 2015
Pre.cluster diffs option Commands in mothur	1	1367	June 16, 2015

pre.cluster command

Related topics