cluster large distance matric

I’m trying to run the cluster step on a 38Gb distance matrix. I’ve been trying to use the hcluster command with method=average and cutoff=0.1. I tried to run the command on a 256gb system but I’m only allowed to run the job for 72 hours and the job times out. Do you have any suggestions?


First off, a 38 GB matrix seems ridiculously large for any dataset/environment. You might double check that you’re following the Costello example complete with the quality trimming options. Here’s something to try - it’s not officially released, but it is available within mothur…

Hi Pat,
my dataset actually started with over half a million sequences and I was able to reduce the dataset to just over 106 000 sequences. I will try the split command. Thanks!