Files for shhh.seqs

I’m having trouble understanding the shhh.seqs command.

Is this a single-linkage algorithm? I’m assuming so because the default sigma parameter is set at 0.01. Can you change this default simply by adding “sigma=…”?

What data is used for the “shhh.seqs” command? In the description in the Mothur Commands it appears that all you need are a fasta file and a names file (and a groups file is optional). Is this raw sequencing data post-trimming or do the sequences need to be aligned? How is this different than pre.cluster?

I was at the December Mothur workshop and my notes from the workshop state that you give Mothur the rates at which certain sequence errors occur based on your mock community data, but I’m not sure how to extract those particular errors or how to feed them into this command.

Lisa,

Not exactly single-linkage. This is our implementation of Chris Quince’s SeqNoise algorithm. It is similar in idea to shhh.flows, but instead of clustering flow grams, it clusters sequences to get rid of residual sequencing errors. It serves the same purpose as our pre.cluster command. The difference is that in pre.cluster you can set the maximum number of differences. In shhh.flows, you modify the sigma variable and this can result in some sequences getting clustered together that are quite different from each other. By changing sigma, you change the size of the clusters. You can read more about it in our PLoS ONE paper.

Pat