Hi There,
AS fallowing the Schloss SOP example I run my 16s pyro sequences as fallow:
And when I use the cluster () I loose all the cutoffs…I end up only with one unique list…(see below)…is this a bug?!
Thanks
kim
mothur > dist.seqs(fasta=Lema_16s_adults_final.unique.fasta, cutoff=0.15)
Output File Name:
Lema_16s_adults_final.unique.dist
It took 559 to calculate the distances for 13029 sequences.
mothur > cluster(column=Lema_16s_adults_final.unique.dist, name=Lema_16s_adults_final.names)
********************###########
Reading matrix: ||||||||||||||||||||||||||||||||||||||||||||||||||||
unique 1 13029
changed cutoff to 0.006536
Output File Names:
Lema_16s_adults_final.unique.an.sabund
Lema_16s_adults_final.unique.an.rabund
Lema_16s_adults_final.unique.an.list
It took 371 seconds to cluster
This generally happens when people include sequences that do not fully overlap with each other (i.e. did you use filter.seqs(trump=., vertical=T)?)
Hi,
I have the same issue as above for a data set. The filter.seqs command was run with the mentioned settings. This is the output:
Length of filtered alignment: 1176
Number of columns removed: 48824
Length of the original alignment: 50000
Number of sequences used to construct filter: 287861
Before running dist.seqs I also run unique.seqs, pre.cluster and chimera.uchime.
Any more suggestions what might be the problem?
Thanks!
You might also try increasing your cutoff value.
"Why does the cutoff change when I cluster with average neighbor?
This is a product of using the average neighbor algorithm with a sparse distance matrix. When you run cluster, the algorithm looks for pairs of sequences to merge in the rows and columns that are getting merged together. Let’s say you set the cutoff to 0.05. If one cell has a distance of 0.03 and the cell it is getting merged with has a distance above 0.05 then the cutoff is reset to 0.03, because it’s not possible to merge at a higher level and keep all the data. All of the sequences are still there from multiple phyla. Incidentally, although we always see this, it is a bigger problem for people that include sequences that do not fully overlap. "