I am concerned about sth I think I don’t understand properly.
In the 454SOP there is this comment: Final step to reduce the sequencing error: use the pre.cluster command to merge sequence counts that are within 2 bp of a more abundant sequence. This implementation of the command will split the sequences by group and then within each group it will pre-cluster those sequences that are with 1 or 2 bases of a more abundant sequence. It will then merge all the sequences back into one fasta file and a names file. As a rule of thumb we use a difference of 1 bp per 100 bp of sequence length.
This means that if I work with FLX+ and have sequence lengths around 500 bp I should select diffs=5 instead of diffs=2 as in the SOP?
And then, this means that two or more sequences that were 1-2 bp apart will continue being two sequences for the counts but only one sequence will represent the group in the unique sequences dataset? Is the same idea behind OTU deffinition but instead of making clusters of sequences that share (let’s say) 97% similarity here the clusters are made of sequences that are no more than 2 bp different?
Sorry, I am puzzled with this
Thank you. Susana