cluster large distance matric

umberar7 · February 8, 2011, 4:10am

Hi,
I’m trying to run the cluster step on a 38Gb distance matrix. I’ve been trying to use the hcluster command with method=average and cutoff=0.1. I tried to run the command on a 256gb system but I’m only allowed to run the job for 72 hours and the job times out. Do you have any suggestions?

Thanks

pschloss · February 8, 2011, 3:04pm

First off, a 38 GB matrix seems ridiculously large for any dataset/environment. You might double check that you’re following the Costello example complete with the quality trimming options. Here’s something to try - it’s not officially released, but it is available within mothur…

http://www.mothur.org/wiki/Cluster.split

umberar7 · February 8, 2011, 3:14pm

Hi Pat,
my dataset actually started with over half a million sequences and I was able to reduce the dataset to just over 106 000 sequences. I will try the split command. Thanks!

Topic		Replies	Views
Computer Issues with hcluster Commands in mothur	2	2956	May 24, 2011
Issues with cluster command Commands in mothur	5	4449	December 19, 2012
Clustering a 10GB distance matrix mothur bugs	2	4197	March 16, 2011
hcluster, average neighbor, large distance matrix mothur bugs	1	4232	June 29, 2011
distance matrix to large to store into RAM Commands in mothur	1	20583	January 25, 2010

cluster large distance matric

Related topics