issues with cutoffs and dist

hi all;

I’m having some issues with cutoffs and dist (I know you get these a lot):

when I set the cutoff to be 0.01 for dist.seqs and then run cluster without specifying the cutoff (defaults to like .014 or so), I get the same counts as I did when I tried clustering uniquely (i.e. getting direct sequence data from the count tables (which has no OTU clustering yet).

I’m not sure if this is because my sequences are so small (57-60 bps) that clustering seqs that are 1% or less apart is equivalent to unique or if I’m screwing with the settings in the wrong way.

Furthermore, can someone explain to me the theory behind the resetting of the cutoff? I read the FAQ and I don’t quite understand this:

" Let’s say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it’s not possible to merge at a higher level and keep all the data."

looking at I don’t understand…if the distance is above 0.05, why do you reset the distance for these alignments rather than just throwing them out? Isn’t this misrepresenting data?

Finally, to top it all off, my 0.03 clustered shared file looks exactly like my 0.01 shared file which looks exactly like (as stated above), my count table…?

for reference i set dist to 0.15

thanks as always

is it more accurate just to do cluster classic (in case the average neighbor is below the cutoff as opposed to just resetting the cutoff?)?

I’d encourage you to run through the example by hand with and without changing the threshold. You’ll see that what we’re doing outputs the same output for a threshold as if you did cluster.classic without a threshold. You can also run the data through the commands and see the same thing too.