cluster: cutoff changed lower than demanded

DavidBremen · June 5, 2015, 2:03pm

Dear mothur team,
Dear fellow mothur users,

I tried to solve my problem by browsing through the forum and wiki and still do not really understand why my processing is behaving in the certain way it does. I know that this is a topic that has been documented in the FAQ, several forum threads and various discussion boards outside of the mothur webpage. Nevertheless I cannot fully grasp why my dataset is behaving this particular way.
I would really much appreciate your help.

This is the framework:

My dataset
• Partial 16S rRNA gene tags from benthic and pelagic origin, amplified with primer set 341F[barcoded] and 785R/805R
• PCR free TruSeq library prep
• Illumina PE library (2*300bp) that were quality trimmed and merged with the BBtrim/BBmerge at “only” 60bp overlap (= no entire overlap, since this aspect is mentioned in the mothur context several times)
• Input: 1449949 sequences from 31 samples

I followed the MiSeq SOP with following commands (see at the end of the entry for clarification). Input to distance matrix as following (meaning, that my matrix should not be considered as a sparse matrix, right?) for 31 samples.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1243 407 0 3 1
2.5%-tile: 3 1243 440 0 4 31967
25%-tile: 3 1243 441 0 4 319663
Median: 3 1243 460 0 5 639325
75%-tile: 3 1243 465 0 6 958987
97.5%-tile: 3 1243 466 0 7 1246683
Maximum: 3 1243 496 0 8 1278649
Mean: 2.99484 1243 454.032 0 5.17941

of unique seqs: 523505

total # of seqs: 1278649

After calculating my distance matrix with a cutoff at 0.10 (dist.seqs(fasta=Allsamples.unique.good.filter.unique.precluster.pick.fasta, cutoff=0.10, processors=8) I ran the cluster command (column=Allsamples.unique.good.filter.unique.precluster.pick.dist, count=Allsamples.unique.good.filter.unique.precluster.pick.count_table, cutof=0.02, hard=t ). I ran it with different settings (hard=t, precision= 100 and 1000) always at a cutoff of 0.02 (=lower that distance matrix) but always got an input whose cutoff was changed to ~0.0075 – 0.0079. Somewhat this does not make any sense to me. Different to other raised demands/questions, I would like my final cutoff to be higher that the adjusted (meaning 0.02 instead of 0.00787841).

Strangely I cannot find the mothur log to validate that I actually set the cutoff to 0.10 and not 0.01 for the dist.seqs command. A wrongly set cutoff would be the only explanation for the low cutoff during clustering. At the moment I am calculating a new distance matrix to rule out that possibility. Since the calculation takes several days and I am pretty sure that, this won’t be the problem I wanted to ask you, whether you have encountered that mentioned issue before or whether you have an idea of what I am doing wrong.

PS: Just running the cluster command without any cutoff set did not help either. For some reasons it only spit out output for unique clusters.

Any help is greatly appreciated.

Thanks a lot!
David

Below my commands (quality trimming not done with mothur)
• unique.seqs(fasta=Allsamples.fasta)
• count.seqs(name=Allsamples.names, group=Allsamples.groups, processors=8)
• pcr.seqs(fasta=silva.bacteria.fasta,start=6380, end=25316, keepdots=F,processors=8)
• system(mv silva.bacteria.pcr.fasta silva.nr_v119_v3v4.align)
• summary.seqs(fasta=silva.nr_v119_v3v4.align)
• align.seqs(fasta=Allsamples.unique.fasta, reference=silva.nr_v119_v3v4.align,flip=t)
• summary.seqs(fasta=Allsamples.unique.align, count=Allsamples.count_table)
• screen.seqs(fasta=Allsamples.unique.align, count=Allsamples.count_table,summary=Allsamples.unique.summary,start=8, end=18936)
• summary.seqs(fasta=Allsamples.unique.good.align,count=Allsamples.good.count_table, processors=8)
• filter.seqs(fasta=Allsamples.unique.good.align, vertical=T)
• unique.seqs(fasta=Allsamples.unique.good.filter.fasta, count=Allsamples.good.count_table)
• summary.seqs(fasta=Allsamples.unique.good.filter.unique.fasta, count=Allsamples.unique.good.filter.count_table)
• pre.cluster(fasta=Allsamples.unique.good.filter.unique.fasta, count=Allsamples.unique.good.filter.count_table,diffs=2)
• summary.seqs(fasta=Allsamples.unique.good.filter.unique.precluster.fasta, count=Allsamples.unique.good.filter.unique.precluster.count_table)
• classify.seqs(fasta=Allsamples.unique.good.filter.unique.precluster.fasta,count=Allsamples.unique.good.filter.unique.precluster.count_table,reference=silva.nr_v119_v3v4.align, taxonomy=silva.bacteria.silva.tax, cutoff=60, processors=8)
• remove.lineage(fasta=Allsamples.unique.good.filter.unique.precluster.fasta, count=Allsamples.unique.good.filter.unique.precluster.count_table, taxonomy=Allsamples.unique.good.filter.unique.precluster.silva.wang.taxonomy, taxon=Chloroplast)
• summary.seqs(fasta=Allsamples.unique.good.filter.unique.precluster.pick.fasta,count=Allsamples.unique.good.filter.unique.precluster.pick.count_table)
• count.groups(count=Allsamples.unique.good.filter.unique.precluster.pick.count_table)
• dist.seqs(fasta=Allsamples.unique.good.filter.unique.precluster.pick.fasta, cutoff=0.10, processors=8)
• cluster(column=Allsamples.unique.good.filter.unique.precluster.pick.dist, count=Allsamples.unique.good.filter.unique.precluster.pick.count_table, cutof=0.02, hard=t)
o changed cutoff to 0.00787841

Kendra · June 5, 2015, 5:09pm

Are you wanting 0.02 OTUs? If so you should be running the clustering with higher cutoff (I’d guess at least double desired OTU level but am just guessing that rule of thumb), then when you run make.shared(label=0.02) to get just the 0.02 OTUs

DavidBremen · June 8, 2015, 8:14am

Thanks for the answer.
I will give it a try. Since I tried to run the command without any cutoff, I doubt, that it will actually give me a cutoff that is not changed automatically below 0.02. Or am I getting it wrong, that a changed cutoff during clustering will not allow me to get any information about OTUs at a higher cutoff level (i.e. I want OTUs at 0.02, cutoff was changed to <0.02).
David

pschloss · June 16, 2015, 1:17pm

You are also setting the cutoff in dist.seqs (cutoff=0.10). Try setting that and the cutoff in cluster to 0.20.

Also see this…
http://www.mothur.org/wiki/Frequently_asked_questions#Why_does_the_cutoff_change_when_I_cluster_with_average_neighbor.3F

DavidBremen · June 17, 2015, 9:02am

Hi Pat,
thanks for the answer.
I have redone everything numerous times now. I never calculated the distance matrix at such “high” cutoff since I wasnt really interested in a cutoff that high. I will give it a try now.
Best,
David

pschloss · June 18, 2015, 1:36pm

You might also try just using cluster.split, which should be faster and just as accurate.

DavidBremen · June 22, 2015, 7:41am

The combination of calculating a new distance matrix with cutoff=0.20 and cluster.split (including clustering at 0.20) worked for me!
Thanks for the help

Topic		Replies	Views
problem with cluster command Commands in mothur	4	4733	July 23, 2010
Cluster cutoff issue Commands in mothur	7	7397	July 8, 2011
memory differences between cluster.classic and cluster? mothur bugs	1	2765	October 17, 2012
cutoff not working correctly in cluster command Commands in mothur	5	5047	March 2, 2012
unable to change cutoff Commands in mothur	6	5406	January 7, 2011

cluster: cutoff changed lower than demanded

of unique seqs: 523505

Related topics