Hello! I wanted to be certain that when running the pre.cluster command, I should consider the lenght of my sequences, as its indicated in the SOP:
“We generally favor allowing 1 difference for every 100 bp of sequence”.
If my sequences have a length mean of 416 bps, should I set the diff parameter to 4? I am confused cause in the pre.cluster command wiki I can see you explain that this mismatch parameter represents the double of bp’s.
Caveat emptor
Something to keep in mind is that when you set the number of mismatches to 2, you are allowing that the maximum difference between sequences within a cluster to be 4 (2 from the dominant sequence in one direction, and 2 in any other direction).
–> Do I then set diffs to be 2 in order to consider the lengh of my sequences, allowing 1bp diff per 100 bps?
Many thanks for all the help you provide and the thorough information in the forum!
Luis