I am having problems in the cluster step of the process.
A general breakdown of what has been done:
Platform that generated the sequences: Illumina MiSeq.
Number of samples: 20
Number of unique sequences: 152906 (after removing chimeras and filtering the aligned sequences).
Analysis being done on:
Windows 64-bit, I7-4790 CPU@3.6 GHz processor (four cores) with 32 GB of RAM
Windows version, Running 32Bit Version (although we have downloaded the 64-bit version, and done this twice, this is what is reported in the logfile)
Last updated: 03/31/2015
Finished all steps up to dist.seqs/cluster.
I ran dist.seqs at a cutoff of 0.10 and then after after running cluster (using the average neighbor setting) got this message:
cluster(phylip=D:\Data\Mothur\Smith_OAES\smithoaes.unique.good.pick.good.filter.unique.precluster.pick.pick.phylip.dist, count=D:\Data\Mothur\Smith_OAES\smithoaes.unique.good.pick.good.filter.unique.precluster.uchime.pick.pick.pick.count_table, cutoff=0.10)
Reading matrix: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
changed cutoff to 0.0249489
Output File Names:
It took 22995 seconds to cluster
A question that I have is, why does it change the cutoff? I assume that it is because there where no distances/unique sequences that could be clustered between that value (0.0249489) and the 0.10 cutoff used in dist.seqs.
So, reran dist.seqs at a higher cutoff, 0.20, and generated a file that was 288 GB in size (column-based format). When I try to cluster those, the system bogs down and has difficulty finishing this step. It has been running for 3 days and is not yet finished. A previous attempt resulted in the file not completing the step - not sure why, but it had stopped before any output was generated.
Any ideas? Trying to get sequences that cluster between the 0.03 or higher cutoff, like 0.05, to complete the analysis.