Cluster

udesilva · August 3, 2015, 4:18pm

Hello,

I am having problems in the cluster step of the process.

A general breakdown of what has been done:

Platform that generated the sequences: Illumina MiSeq.
Number of samples: 20
Number of unique sequences: 152906 (after removing chimeras and filtering the aligned sequences).

Analysis being done on:
Windows 64-bit, I7-4790 CPU@3.6 GHz processor (four cores) with 32 GB of RAM
mothur software:
Windows version, Running 32Bit Version (although we have downloaded the 64-bit version, and done this twice, this is what is reported in the logfile)
mothur v.1.35.1
Last updated: 03/31/2015

Finished all steps up to dist.seqs/cluster.

I ran dist.seqs at a cutoff of 0.10 and then after after running cluster (using the average neighbor setting) got this message:

mothur >
cluster(phylip=D:\Data\Mothur\Smith_OAES\smithoaes.unique.good.pick.good.filter.unique.precluster.pick.pick.phylip.dist, count=D:\Data\Mothur\Smith_OAES\smithoaes.unique.good.pick.good.filter.unique.precluster.uchime.pick.pick.pick.count_table, cutoff=0.10)
********************###########
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

changed cutoff to 0.0249489

Output File Names:
D:\Data\Mothur\Smith_OAES\smithoaes.unique.good.pick.good.filter.unique.precluster.pick.pick.phylip.an.unique_list.list

It took 22995 seconds to cluster

A question that I have is, why does it change the cutoff? I assume that it is because there where no distances/unique sequences that could be clustered between that value (0.0249489) and the 0.10 cutoff used in dist.seqs.

So, reran dist.seqs at a higher cutoff, 0.20, and generated a file that was 288 GB in size (column-based format). When I try to cluster those, the system bogs down and has difficulty finishing this step. It has been running for 3 days and is not yet finished. A previous attempt resulted in the file not completing the step - not sure why, but it had stopped before any output was generated.

Any ideas? Trying to get sequences that cluster between the 0.03 or higher cutoff, like 0.05, to complete the analysis.

Thanks,
udaya

pschloss · August 5, 2015, 2:34pm

Can try running cluster.split with a cutoff of 0.20?

Topic		Replies	Views
setting the clustering cutoff mothur bugs	1	2548	November 29, 2011
cluster: cutoff changed lower than demanded Commands in mothur	6	3422	June 22, 2015
cluster bug? mothur bugs	2	2729	February 7, 2013
change in clustering cutoff Commands in mothur	3	2751	September 11, 2013
cutoff not working correctly in cluster command Commands in mothur	5	5076	March 2, 2012

Cluster

Related topics